Reputation: 542
I am working on an application in which I may potentially need to log the entire traffic reaching the server. This feature may be turned on or off, or may be used when exceptions are caught.
In any case, I am concerned about the blocking nature of disk I/O operations and their impact on the performance of the server. The business logic that is applied when a request is handled (mostly POST http requests), is asynchronous in such that every network or db calls are asynchronously executed.
On the other hand, I am concerned about the delay to the thread while it is waiting for the disk IO operation to complete. The logged messages can be a few bytes to a few KBs but in some cases a few MBs. There is no real need for the thread to pause while data is written to disk, the http request can definitely complete at that point and there is no reason that the ioloop thread not to work on another task while data is written to disk.
So my questions are:
Any comments or suggestion are much appreciated,
Erez
Upvotes: 4
Views: 1553
Reputation: 25263
I have ops experience from the time of the question circa 2016 with a Python 3.4 Tornado 4 application running on a decent bare-metal machine. The application interacted with few 3rd-party HTTP APIs, and logged some of the interactions for potential troubleshooting in the future (which is similar to OP's requirements). The machine had a RAID of HDDs. As far as I can recall the application wasn't high-traffic.
Tornado 4 had its own IO loop implementation (Tornado 5+ uses asyncio
's now), and there was an interesting code instrumentation, controlled by IOLoop.set_blocking_log_threshold
. Basically it logged a WARNING
record with the stack trace whenever the loop was blocked longer than the threshold
seconds. I can find a couple of screenshots from that time from the Sentry timeline for the very warning where the threshold was set to 1 second.
Most of the warnings had stack traces ending on the logging file handler's flush. It was a rotating and gzipping file handler. The later may explain what might take longer that a second, but anyway for the application it was desired to keep full responsibility over logging. The solution was the stdlib pair of logging.handlers.QueueHandler
and logging.handlers.QueueListener
.
Python logging cookbook has a dedicated section on Dealing with handlers that block. Here's the example from it (where listener.start
starts a thread that read off the queue and delegates the records to the handler
):
que = queue.Queue(-1) # no limit on size
queue_handler = QueueHandler(que)
handler = logging.StreamHandler()
listener = QueueListener(que, handler)
root = logging.getLogger()
root.addHandler(queue_handler)
formatter = logging.Formatter('%(threadName)s: %(message)s')
handler.setFormatter(formatter)
listener.start()
# The log output will display the thread which generated
# the event (the main thread) rather than the internal
# thread which monitors the internal queue. This is what
# you want to happen.
root.warning('Look out!')
listener.stop()
For a real-world reference of a QueueHanlder
implementation that covers the edge cases, chronologer.client.QueueProxyHandler
can be used.
asyncio
instrumentationasyncio
has a debug mode.
By default asyncio runs in production mode. In order to ease the development asyncio has a debug mode. [...] When the debug mode is enabled:
- asyncio checks for coroutines that were not awaited and logs them; this mitigates the “forgotten await” pitfall.
- Many non-threadsafe asyncio APIs (such as
loop.call_soon()
andloop.call_at() methods
) raise an exception if they are called from a wrong thread.- The execution time of the I/O selector is logged if it takes too long to perform an I/O operation.
- Callbacks taking longer than 100ms are logged. The
loop.slow_callback_duration
attribute can be used to set the minimum execution duration in seconds that is considered “slow”.
It may look richer than what Tornado 4 had, but in fact it's not. First, it's not intended for production (and a pretty important metric is missing). Moreover, it's an after the fact warning without a stack trace, whereas Tornado's implementation was based on signal.SIGALRM
and provided the stack trace at the threshold hit.
Have you noticed that the warnings didn't go away completely? But I can assure you that the logging issue was fixed. What caused these rare issues, to my surprise, was uuid.uuid4
, which can be blocking on a machine with empty entropy pool, but that's a another story.
asyncio
maintainers about
asynchronous file IO, logging and aiofiles
libraryUpvotes: 1
Reputation: 554
" write asynchronously to a pipe or socket to another process (syslog?"
How can it be? log_request
is a normal function - not a coroutine and all default python handlers are not driven by asyncio event loop so they are not truly asynchronous. This is imho one of the factors that make Tornado less performant than ie. aiohttp. Writing to the memory or using udp is fast but it is not async anyway.
Upvotes: 0
Reputation: 22154
For "normal" logging (a few lines per request), I've always found logging directly to a file to be good enough. That may not be true if you're logging all the traffic to the server. The one time I've needed to do something like that I just captured the traffic externally with tcpdump
instead of modifying my server.
If you want to capture it in the process, start by just writing to a file from the main thread. As always, measure things in your own environment before taking drastic action (IOLoop.set_blocking_log_threshold
is useful for determining if your logging is a problem).
If writing from the main thread blocks for too long, you can either write to a queue that is processed by another thread, or write asynchronously to a pipe or socket to another process (syslog?).
Upvotes: 4