Prior to co-founding Appboy, I was a release manager at a large finance company. If someone wanted to release anything to our production environment, my job was to ensure that it was safe with minimal risk. “Safe” varied by product, but it always entailed a few common “checklistable” elements, such as code reviews, user acceptance testing, and ensuring that we had a rollback plan in place.
Some of those steps are automatable, but, as is often the case when working on large, legacy systems, much of this had to be done manually. With multiple releases each week, release management took up a lot of my time.
When we started to build the engineering process at Appboy, I resolved to automate my previous role as much as possible. The first step toward this was to set up continuous integration infrastructure.
When it came to choosing which CI server to use, I searched for these factors in this order:
A tool that was easy to setup and administer and simple to use
We simultaneously build multiple Ruby projects, a Maven Android project, and an iOS project on the same Jenkins infrastructure (we used to use EC2 for build agents but moved to Mac Minis so that we could also build iOS projects). With Jenkins in place, releases definitely became safer. However, I still had to manually deploy and test them after the fact. Automating that meant moving to continuous deployment, releasing after the tests pass.
Getting to continuous deployment
Before starting to think about continuous deployment, you need to have an easy, automatable way to release your product. With Engine Yard, this is trivial. It’s a one-line command, ey deploy --environment YOUR_APP. But, just because you have an easy way to deploy your code doesn’t mean that you should. Here are the principles and challenges associated with continuous deployment we thought about before setting ours up:
Keep the code quality bar high. Just because tests pass doesn’t mean that the code is written well. After all, the easiest way to make tests pass is to not write them.
Test after the deploy. This is a big one. Despite how closely your test environment matches your production environment, you’re still going to have discrepancies. You need to make sure that after a deploy happens your product remains functional.
Be able to fix problems quickly. If deployments are happening automatically and not on someone’s watch, you need to be able to move quickly to either push forward or rollback as soon as you’re alerted to a problem.
Test the continuous deployment process. A continuous deployment process could stop working for a number of reasons, such as if the underlying host changes and remote host identification fails. You need to be able to detect if the process is broken.
Be able to turn off the continuous deployment process. Sometimes the impact of changes is going to be high. If you need to run a database migration or do maintenance on your servers, you might want to not deploy automatically.
These aren’t hard problems to solve, they just ought to be thought about before continuously deploying your code. Here’s what we did:
For code quality, we use Github pull requests to review all code before it gets to staging.
In terms of deploy impact, Engine Yard makes it easy to use Unicorn as a web server for zero-downtime web deploys, and we stagger long-running process restarts across our server fleet to make sure that we have enough capacity in the middle of a deploy to keep the product functional.
Our continuous deployment process runs automated RSpec and Capybara tests against production after each release, although tests don’t have to run immediately as part of the deploy. We also use monitoring tools like Listerine and CloudKick to make sure our product is working correctly.
Engine Yard makes rollbacks really easy so we didn’t have to do anything special on our end. Another one-line call to ey rollback --environment YOUR_APP and we can revert back to the previous state. In fact, our deploy process automatically performs a rollback and notifies us if the smoke tests fail.
For testing the continuous deployment process, we get notified in Hipchat if a deployment was attempted but failed.
We approached disabling the continuous deployment process as a one-off need. That is, we don’t necessarily want to disable it entirely, just disable it for certain code merges. Since we use pull requests to merge into the main code base, our deployment script can inspect the merge commit message for directives. We use simple hashtags like #nodeploy to instruct our script what to do while it’s working.
Our continuous integration and deployment script
Below is the script that our Jenkins slaves run to build and deploy our projects. Jenkins builds all the branches of our projects and deploys if the commit occurred on one of the branches enabled for continuous deployment. After a deploy has finished, we run smoke tests and rollback if necessary.
We use continuous deployment for all parts of our product and love it. I now spend almost no time on routine deployments and only get involved for major releases that require coordination across all of our systems. Since we’re using git-flow, our develop branch automatically deploys to our staging environment while our master branch goes to production.
We also continuously deploy our iOS SDK over the air using TestFlight. I’ll provide that script in a follow-up post on Appboy’s blog in the upcoming weeks.