We recently run into an issue where restarting Rails within Passenger was starting to take longer and longer as the number of plugins and gems we use have grown. This led to 30-90 seconds where the site was unavailable as Passenger restarted the Rails application spawner across every machine. That's far too long if you like to deploy frequently, which we do.

If you can, the best answer might be to move to Unicorn, the latest in a long line of Rails deployment options. It handles the process of migrating requests from old workers to new workers transparently. Awesome.

Our first thought was that Rails was just loading too slowly. Both Robby and I spent some time profiling Rails boot time and there wasn't a single culprit. Instead there were many contributors from the 50+ gems and 30 plugins we use that made it difficult to radically improve.

Next we took a different approach to have capistrano remove instances from the HAProxy pool and serially restart Passenger instances. To get this working there are a couple steps that I hadn't seen documented elsewhere (hat tip to Matt Conway's rubber for the serial task trick):

  1. Change haproxy.cfg to perform a file-based health check for each backend: option httpchk GET /haproxy.txt In the rails app make sure that same file lives in the public directory so it will respond 200 in the normal case. Also, I needed to make nginx return a 404 when that file does not exist, since that is what HAProxy looks for to remove it from the pool. So in the nginx config:
    if (!-f $document_root/haproxy.txt) {
        return 404;
    } 
    
  2. Change the default "deploy:restart" capistrano task to do the restart serially. Capistrano usually executes the task on all matching hosts concurrently so it requires a little hack to force it to run serially.
    desc "Restart Passenger serially"
    task :restart, :roles => :web do 
      haproxy_health_file = "#{current_path}/public/haproxy.txt"
    
      # Restart each passenger host serially
      self.roles[:web].each do |host|
        # 1. Take it out of the haproxy pool
        run "rm #{haproxy_health_file}", :hosts => host
        sleep(5)
        # 2. Restart passenger
        run "touch #{current_path}/tmp/restart.txt", :hosts => host
        # 3. Ping passenger to get it to warm up
        run "curl -s 'http://localhost:81/' &> /dev/null; exit 0", :hosts => host
        # 4. Re-add app to haproxy pool
        run "touch #{haproxy_health_file}", :hosts => host
      end
    end
    
  3. During deploys notice that haproxy will change the dashboard display to remove that instance and send requests to other instances. Be aware that this does mean not all instances have the same code running at the exact same instant. So for database migrations or other scenarios it might require putting up a maintenance page or coding defensively.