lixo.org

When Deployments Disappear

I’m building a few micro-services in Ruby (Sinatra) and JavaScript (Node.js) to scrape and reformat a few internal data sources at ThoughtWorks. On top of that, an AngularJS application aggregates that data and presents interesting visualizations. It’s a very nice way to separate where the data comes from and what’s done to it, and Angular makes it fairly straightforward to keep the UI tidy and easy to change.

At one point, I needed two kinds of views on the ThoughtWorks people directory. One was a simple search-as-you-type box, and the other a more complete view with a ton of contact details for each of my colleagues. Eventually, there’d be a third view, with only the basic details, to be presented as a pop-up balloon when you hover over somebody’s name or picture.

It made sense to return two (or maybe three) different kinds of JSON from the services, so the search-as-you-type box could be as fast use as little bandwidth as possible. This is what I ended up with:

{ "name": "Carlos Villela", id: "cvillela", "aliases": ["cv"] }

Simple, and enough for that search-as-you-type box. I built a service that ferried the queries over to our LDAP servers (thanks to ruby-ldap) and built the JSON response. Performance wasn’t so great, but I could live with it for a while. Time to deploy it!

Well, not so fast – and if you were wondering, here’s where the yak shaving starts. Deploy what? Where? Who would keep it all running smoothly? How would upgrades, security patches and updates be handled?

I could ask for a virtual machine from our operations guys, slap a Linux distribution in there and get started on some Puppet, Chef or Ansible scripts. But I didn’t want to have to maintain that stuff as well as my code. I definitely see the value in automating the setup and configuration of servers, but I was feeling lazier than usual: I wanted something that went “oh, I see you have a Sinatra application and a Gemfile in this git repository. I have a machine that has everything you need installed; let me run the app for you!”, kind of like what Heroku does when you push to an application repository for the first time.

I couldn’t use Heroku, unfortunately (most of those data sources are only accessible from the ThoughtWorks network), but Dokku did the trick quite well. It’s based on Docker, which does most of the heavy lifting, and a bit of bash glue between gitreceive and the Heroku buildpacks. It allows for exactly the same “run git push heroku master and everything else is taken care of” kind of workflow I was looking for, and it’s a breeze to install.

In about a day or so I had a CoreOS machine running Dokku and serving up two different applications: www, with the AngularJS application, styles and templates, and addressbook with the Sinatra code for LDAP queries. Neat!

Later on, I wanted to build the more complete JSON for the detail view, which I knew I’d need to get from a different data source (LDAP only carries the basics). It would consume the LDAP service too, but then augment it with more contact information. A third application, called contactdetails was pushed and Dokku took care of deploying it. As I was iterating over the format of the JSON responses, I noticed the performance issues of the addressbook application were getting in the way of testing.

Here’s where everything clicked: I could build a mock addressbook and deploy that without touching the original application, in whatever technology stack I wanted, and run it alongside everything else, by simply changing where the contactdetails application pointed to!

$ git remote add mock git@server:addressbook-mock
$ git checkout -b mock
$ rm -rf *
$ curl http://addressbook.server/index.json -o index.json
$ curl http://addressbook.server/cvillela.json -o cvillela.json
$ git add -A
$ git commit -am 'creating mock service (only index and cvillela supported)'
$ git push mock mock

I now had addressbook-mock to play with, with blazing fast responses, and I could tweak individual JSON responses if I felt like it. I could have as many versions of the application running as I wanted: all I needed to do was to find a suitable subdomain for them.

After a while, I had a handful different deployments of the addressbook repository. I needed one for blazingly fast and stable responses to test the contactdetails app. I built another to test reliability and another to ensure timeouts were working well. I don’t even remember what addressbook-broken-json-array was for, but it lived there for a brief period of time.

Eventually, navigating breaking changes to integration points became obvious: if an application relied on an older version of the wire protocol, I could point it to the last version before the breakage (deployed to addressbook-simple-aliases or addressbook-v3, for example), leading to a fairly friction-less upgrade cadence.

In my case, the source data is in a stable format and being held in an external system. That deployment model would break down slightly if I had to deal with applications that owned and kept their data around, as database migrations would easily get in the way of running multiple versions of an application in parallel safely. In a development environment, it’d be enough for the database to sit inside the container (which would effectively isolate each deployment’s database, maybe a nice feature).

Another thing I noticed is that, when deploying multiple versions of an application became as easy as firing them up, I started seeing less value in tests that stub out over-the-network interactions, and made the endpoints behave differently depending on what I wanted to test, instead. That made some of my tests run slower, but I get more confidence that the integration is working well without breaking out a sweat.

Site Updates

I just migrated this blog over to Octopress and started hosting most things at DigitalOcean. WordPress and DreamHost served me pretty well over the years, but I wanted something static and with a little less maintenance. As a result, comments are gone – apologies if you liked a comment in a previous post, I still have the backups and, if you’d like to have a copy, please let me know. Categories are the usual mess.

I’ll be writing a little bit about the new setup shortly; hope this new workflow makes me want to write more. :)

Privacidade Protege Pessoas Que Incomodam [pt-BR]

Este post é uma tradução do artigo publicado recentemente pelo Martin Fowler. O original está aqui.
Uma das consequências da história recente sobre Edward Snowden é uma discussão intensa sobre a importância da privacidade - em particular quando ou se a privacidade deve ser negociada como moeda no combate ao terrorismo. Para pensar sobre isso, precisamos entender porque a privacidade das pessoas é importante para a democracia. Muitas vezes ouvimos declarações como “não tenho nada a esconder”, ou como um amigo meu colocou: “a NSA não se importa com as pessoas insignificantes como você ou eu”. Eu posso cuidar da minha privacidade, mas será que deve o meu desejo pessoal se sobrepor às necessidades da nossa sociedade em geral? Para muitas pessoas, a privacidade é um direito fundamental - não vêem razão pela qual o governo deve se intrometer nos seus assuntos sem uma razão bem mais específica do que uma busca generalizada por possível terrorismo. Mas, mesmo se você não compartilha o desejo de preservar um pouco sua privacidade dos agentes do governo, você ainda deveria estar preocupado com a privacidade dos outros cidadãos. Isto é porque não é sobre mim, ou meu amigo. O valor da privacidade para nós não é principalmente sobre a nossa privacidade, mas sobre aqueles que desempenham um papel mais ativo na operação de um sistema democrático de governo. Essa atividade envolve muitas vezes incomodar as pessoas que têm poder, e aqueles com poder são propensos a usar o seu poder para suprimir aos que os incomodam. Mas, sem todo esse incômodo, a democracia padece. Alguns exemplos concretos devem tornar isso mais fácil de ver. O primeiro é o de jornalistas. O jornalismo é uma profissão que, em si, está visivelmente suspeita sob a lei de Sturgeon, mas a frivolidade e descrédito que mancha muito jornalismo não invalida o bom jornalismo quando ele acontece. O bom jornalismo está a ajudar-nos a entender o que está acontecendo no mundo, e tal jornalismo exige muitas vezes fazer perguntas difíceis a quem está no poder, e cavar muito para encontrar as verdades que os poderosos preferem permanecer ocultas. Tal escavação causa uma preocupação considerável a quem está no poder, especialmente quando expõe a corrupção ou incompetência. Um segundo exemplo é o de ativistas que estão tentando mudar nossa sociedade. Esses ativistas, por sua natureza, estão muitas vezes tentando mudar hábitos de comportamento aceitos. Eles podem estar clamando por direitos dos homossexuais, ou contra o aborto, ou contra a agricultura industrial. Seus protestos e campanhas, muitas vezes, são contra os interesses de quem está no poder, e assim, suas atividades são bastante incômodas - especialmente quando elas ganham tração. Então, vamos supor que você está no poder, e você está sendo incomodado por jornalistas e ativistas, e você tem acesso aos metadados sobre as ligações telefônicas de todo mundo. Com essas informações você pode descobrir com quem seus algozes estão falando, onde as suas fontes de informação estão, quem está apoiando-os com incentivo e fundos. Você pode agir contra essas pessoas e bloquear seu apoio. Você também pode encontrar coisas sobre seus atormentadores e seus apoiadores que podem ser usadas para desacreditá-las. Um ativista em favor dos direitos dos homossexuais é propenso a conhecer muitas pessoas gays, muitas vezes em lugares onde a homossexualidade é considerada abominável. Essa é uma vulnerabilidade que você pode explorar. Além disso, seus algozes não são susceptíveis a ser santos, já que a maioria das pessoas guarda algo que pode soar mal, especialmente quando você pode usar da sua influência para ampliar a distorção dos fatos. O abuso de drogas de um jornalista pode não impedi-lo de expor a corrupção, mas você pode usá-lo para sabotar seus esforços. Eu não estou dizendo que a privacidade é uma necessidade absoluta. Frustrar atividades criminais muitas vezes significa violar privacidade - um banco de dados de chamadas telefônicas pode ser uma ferramenta útil para investigar uma rede criminosa. Mas devemos também estar cientes de que tais ferramentas estão sempre susceptíveis a utilização indevida e, portanto, temos de garantir que discutir como projetar controles que reduzam esse desvio ao mínimo possível. Eu não sou um jornalista engajado, nem ativista, então por que deveria me preocupar com tudo isso? Sem bons jornalistas não consigo entender o que está realmente está acontecendo e, portanto, meu voto fica menos significativo. Corrupção florescente sufoca a atividade econômica e do progresso. Ativistas que parecem estar distantes agora podem nos levar a mudanças que serão auto-evidentes em algumas gerações (houve assédio considerável àqueles que lutaram contra a escravidão ou a favor do sufrágio feminino, por exemplo). Em suma, se não podemos proteger a privacidade das pessoas que incomodam os poderosos, perderemos um pilar vital de nossa sociedade democrática.

Olympic Build and Packaging Pipelines

When setting up an automated continuous delivery pipeline for our current project, we decided to use RPMs and Yum – the native packaging and software updates platform of our staging, UAT and production environments – instead of more Ruby-esque solutions like Capistrano.

There were several reasons behind that, but by far the most important was the affinity we noticed the operations staff already had with RPMs and Yum: all of the system packages were being taken care of using it, with some deal of auditing thrown in: has this file changed since we installed the package? Why?

Making a decent RPM out of a Rails (or Sinatra) project wasn’t very hard: a bit of head-scratching and a few passes through Maximum RPM later, we had something we could work with.

The main headaches were working out which packages were necessary to build the RPM (the BuildRequires part), figuring out how to reliably install all of the gem dependencies from bundler into the packaged RPM (bundle install --deployment helped), and which changes needed to be made to the application to rely solely on environment variables set by /etc/default/[app]. This way, we wouldn’t have any configuration that could vary across environments coming from the package itself.

The next step was setting up a marathon of tests for those RPMs. They already contained unit tested code, and we built more obstacles: functional and integration tests, performance micro-benchmarks, metrics analysis and some manual inspection in UAT. With each step weeding out bad candidates, only truly excellent builds can get to production.

To be able to easily visualize how excellent our builds were, we came up with a simple and effective naming scheme, in time for the Olympics: precious metals.

A unit tested RPM would start out in the “tin” repository. Another step in the pipeline gets triggered and deploys it to a smoke test machine. If that works, it gets promoted to the “bronze” repository. Functional and integration tests cause it to be promoted to “silver”, and so on through “gold” (ready to go) and “platinum” (in production already).

To the operations staff, this makes a lot of sense: the production machines are fetch updates from “gold”, UAT environments look at “silver” and so on. It’s trivial to configure Yum to do that – chances are you alraedy did it when setting up your distribution –, and its output is very easy to read and understand when something goes wrong.

Looking back at the Capistrano days, my only regret is not having done this sooner!

Logging: A UI Problem

Your logs are part of the UI. They are streams of interesting and actionable events that will be consumed by both machines and humans.

The most useful practice I’ve followed so far is to keep that in mind and act accordingly: understand the computer systems parsing, filtering and analyzing logs and talk to all the people who will be notified when something of interest happens. Watch what they do, and ask yourself “how could the output of my application be more helpful in this scenario?”

The parties interested in your application’s logs are usually at a conflict: what’s interesting and actionable to developers and testers isn’t so important to production support engineers, and your SQL timing statements are probably seen as junk to the analytics tool looking for security issues.

In order to minimize that, whatever logging framework you’re using should be able to direct those streams of events with pre-defined (and hopefully, easily configurable) filters, and each type of environment or user should be able to have its own configuration.

Here’s a few examples to illustrate the point:

During development, it makes sense to have every debug statement relevant to the module being worked on going to the same stream, while telling the framework to take it easy with all other modules. Events from other modules may be interesting, but they should be filtered out if they’re not actionable, as you’re not going to do anything with them. Changing the filter so you can look at different modules should take no more than a few seconds of work (but may require bouncing a server or two).

While running unit tests on a continuous integration set-up, it may make sense to disable verbose logging altogether: if your automated testing environment is sufficiently mature, at least one of the tests will break and you’ll be able to replay the failure on a development workstation to get at the details. In that kind of environment, not only you want to be mindful of disk usage, the events themselves are usually not very actionable anyways.

In production, leave that configuration to people experienced with support: talk to engineers who will get paged at 3am and rushed into a cab if a particular type of error happens, and get their input. They will tell you exactly what kinds of errors they’re interested in on your application in particular. Remember this is probably specific to the domain you’re working on, and that support engineers usually take care of more than one application, and more than one server.

A very common mistake I see (looking at you, JBoss!) is to treat errors that developers should see as important (a NullPointerException, for example) and that production support people can’t do a thing about. Don’t wake them up unless there is something they can do to fix the problem, or risk crying wolf too many times and having them filtering out important, actionable notifications, like OutOfMemory errors, low disk space, etc.