pythonlinuxmultithreadingasynchronoustornado

Reputation: 542

Logging in an asynchronous Tornado (python) server

I am working on an application in which I may potentially need to log the entire traffic reaching the server. This feature may be turned on or off, or may be used when exceptions are caught.

In any case, I am concerned about the blocking nature of disk I/O operations and their impact on the performance of the server. The business logic that is applied when a request is handled (mostly POST http requests), is asynchronous in such that every network or db calls are asynchronously executed.

On the other hand, I am concerned about the delay to the thread while it is waiting for the disk IO operation to complete. The logged messages can be a few bytes to a few KBs but in some cases a few MBs. There is no real need for the thread to pause while data is written to disk, the http request can definitely complete at that point and there is no reason that the ioloop thread not to work on another task while data is written to disk.

So my questions are:

am I over-worried about this issue? is logging to standard output and later redirecting it to a file "good enough"?
what is the common approach, or the one you found most practical for logging in tornado-based applications? even for simple logging and not the (extreme) case I outlined above?
is this basically an ideal case for queuing the logging messages and consume them from a dedicated thread?
Say I do offload the logging to a different thread (like Homer Simpson's "Can't Someone Else Do It?"), if the thread that performs the disk logging is waiting for the disk io operation to complete, does the linux kernel takes that point as an opportunity a context switch?

Any comments or suggestion are much appreciated,

Erez

Upvotes: 4

Answers (3)

saaj

Reputation: 25263

Historic reference

I have ops experience from the time of the question circa 2016 with a Python 3.4 Tornado 4 application running on a decent bare-metal machine. The application interacted with few 3rd-party HTTP APIs, and logged some of the interactions for potential troubleshooting in the future (which is similar to OP's requirements). The machine had a RAID of HDDs. As far as I can recall the application wasn't high-traffic.

Tornado 4 had its own IO loop implementation (Tornado 5+ uses asyncio's now), and there was an interesting code instrumentation, controlled by IOLoop.set_blocking_log_threshold. Basically it logged a WARNING record with the stack trace whenever the loop was blocked longer than the threshold seconds. I can find a couple of screenshots from that time from the Sentry timeline for the very warning where the threshold was set to 1 second.

Most of the warnings had stack traces ending on the logging file handler's flush. It was a rotating and gzipping file handler. The later may explain what might take longer that a second, but anyway for the application it was desired to keep full responsibility over logging. The solution was the stdlib pair of logging.handlers.QueueHandler and logging.handlers.QueueListener.

Logging queue

Python logging cookbook has a dedicated section on Dealing with handlers that block. Here's the example from it (where listener.start starts a thread that read off the queue and delegates the records to the handler):

que = queue.Queue(-1)  # no limit on size
queue_handler = QueueHandler(que)
handler = logging.StreamHandler()
listener = QueueListener(que, handler)
root = logging.getLogger()
root.addHandler(queue_handler)
formatter = logging.Formatter('%(threadName)s: %(message)s')
handler.setFormatter(formatter)
listener.start()
# The log output will display the thread which generated
# the event (the main thread) rather than the internal
# thread which monitors the internal queue. This is what
# you want to happen.
root.warning('Look out!')
listener.stop()

For a real-world reference of a QueueHanlder implementation that covers the edge cases, chronologer.client.QueueProxyHandler can be used.

`asyncio` instrumentation

asyncio has a debug mode.

By default asyncio runs in production mode. In order to ease the development asyncio has a debug mode. [...] When the debug mode is enabled:

asyncio checks for coroutines that were not awaited and logs them; this mitigates the “forgotten await” pitfall.

Many non-threadsafe asyncio APIs (such as loop.call_soon() and loop.call_at() methods) raise an exception if they are called from a wrong thread.

The execution time of the I/O selector is logged if it takes too long to perform an I/O operation.

Callbacks taking longer than 100ms are logged. The loop.slow_callback_duration attribute can be used to set the minimum execution duration in seconds that is considered “slow”.

It may look richer than what Tornado 4 had, but in fact it's not. First, it's not intended for production (and a pretty important metric is missing). Moreover, it's an after the fact warning without a stack trace, whereas Tornado's implementation was based on signal.SIGALRM and provided the stack trace at the threshold hit.

For the curious

Have you noticed that the warnings didn't go away completely? But I can assure you that the logging issue was fixed. What caused these rare issues, to my surprise, was uuid.uuid4, which can be blocking on a machine with empty entropy pool, but that's a another story.

How can it be? log_requestis a normal function - not a coroutine and all default python handlers are not driven by asyncio event loop so they are not truly asynchronous. This is imho one of the factors that make Tornado less performant than ie. aiohttp. Writing to the memory or using udp is fast but it is not async anyway.

Upvotes: 0

Ben Darnell

Reputation: 22154

For "normal" logging (a few lines per request), I've always found logging directly to a file to be good enough. That may not be true if you're logging all the traffic to the server. The one time I've needed to do something like that I just captured the traffic externally with tcpdump instead of modifying my server.

If you want to capture it in the process, start by just writing to a file from the main thread. As always, measure things in your own environment before taking drastic action (IOLoop.set_blocking_log_threshold is useful for determining if your logging is a problem).

If writing from the main thread blocks for too long, you can either write to a queue that is processed by another thread, or write asynchronously to a pipe or socket to another process (syslog?).

Upvotes: 4

Logging in an asynchronous Tornado (python) server

Answers (3)

Historic reference

Logging queue

`asyncio` instrumentation

For the curious

Further reading

Related Questions

Logging in an asynchronous Tornado (python) server

Answers (3)

Historic reference

Logging queue

asyncio instrumentation

For the curious

Further reading

Related Questions

`asyncio` instrumentation