How do I handle dyno restarts when using Django?

Question

I want to handle dyno restarts on Heroku according to their description here:

During this time they should stop accepting new requests or jobs and eattempt to finish their current requests, or put jobs back on the queue for other worker processes to handle.

From the looks of it, when python receives SIGTERM and the signal handler is called (per signal.signal), the current thread running is stopped, so the request is stopped in the middle of running.

How do I meet both requirements? (stop accepting new requests + finish the current requests)

Peter Brittain · Accepted Answer

EDIT: Added simplified example code, explained ongoing requests/termination better and added gist from CrazyPython.

On the face of it, You have 4 problems to solve. I'll take them in turn and then give some sample code that should help clarify:

Handling SIGTERM

This is simple. You just need to set up a signal handler to note that you need to shut down. PMOTW has a good set of examples of how to catch the signal. You could use variants of this code to catch SIGTERM and set a global flag that says you are shutting down.

Rejecting new requests

Django middleware provides a neat way of hooking any HTTP request to your application. You could create a simple process_request() hook that returns an error page if the global flag (from above) is set.

Completing existing requests

With any new requests stopped, you now have to let your current requests complete. Although you might not believe it right now, this means you simply do nothing and let the program just carry on running as usual after the SIGTERM. Let me expand on that...

The contract with heroku is that you must complete within 10s of a SIGTERM, or it will send a SIGKILL anyway. That means there is nothing you can do (as a well-behaved application) to ensure that all requests always complete. Consider the 2 cases:

Your application processes all existing requests within 10s. In this case just leaving your program running will let the requests complete. No special code to run the requests is needed - all the threads/processes are already doing what you need!
Your application takes more than 10s for some requests. In this case, there is nothing you can do - it will be terminated with ultimate force by heroku before the long request completes. If you're thinking you can ignore the SIGKILL, think othewise... This is not allowed - see the signals documentation.

In both cases, therefore, the solution is just to let your program carry on running to let as many current requests complete before terminating.

Terminating your application

The simplest thing to do might be to wait for the SIGKILL to come along from heroku 10 seconds later. It's not elegant, but it should be OK because you are rejecting any new requests.

If that's not good enough, you need to track your outstanding requests and use that to decide when you can close down your application. The exact way to close your application will depend on whatever is hosting it, so I can't give you exact guidance there. Hopefully the sample code gives you enough of a pointer, though.

Sample code

Starting from the signal handler example in PMOTW, I've beefed up the code to add multiple threads processing requests and a termination manager to catch the signal and allow the app to shut down gracefully. You should be able to run this in Python2.7 and then try killing the process.

Building on this example, CrazyPython created this gist to give a concrete implementation in django.

import signal
import os
import time
import threading
import random


class TerminationManager(object):

    def __init__(self):
        self._running = True
        self._requests = 0
        self._lock = threading.Lock()
        signal.signal(signal.SIGTERM, self._start_shutdown)

    def _start_shutdown(self, signum, stack):
        print 'Received:', signum
        self._running = False

    def start_request(self):
        with self._lock:
            self._requests += 1

    def stop_request(self):
        with self._lock:
            self._requests -= 1

    def is_running(self):
        return self._running or self._requests > 0

    def running_requests(self):
        return self._requests


class DummyWorker(threading.Thread):

    def __init__(self, app_manager):
        super(DummyWorker, self).__init__()
        self._manager = app_manager

    def run(self):
        while self._manager.is_running():
            # Emulate random work and delay between requests.
            if random.random() > 0.9:
                self._manager.start_request()
                time.sleep(random.randint(1, 3))
                self._manager.stop_request()
            else:
                time.sleep(1)
        print "Stopping worker"


manager = TerminationManager()
print 'My PID is:', os.getpid()

for _ in xrange(10):
    t = DummyWorker(manager)
    t.start()

while manager.is_running():
    print 'Waiting with {} running requests'.format(manager.running_requests())
    time.sleep(5)

print 'All done!'

How do I handle dyno restarts when using Django?

Answers (1)

Related Questions