Custom Processing on Python logging messages in a generic way

Question

I am trying to figure out the best approach to apply some custom processing on Python logging messages with minimal impact to our codebase.

The problem is this: we have many different projects logging a lot of things, and among these can be found some AWS keys. As a security requirement, we need to strip out all AWS keys from the logs, and there are multiple ways to go about this:

The naive approach would be to go in each and every project, and modify each logging call to manually strip out keys. This is the least preferred approach as it would be the most manual.
Implement a different module that provides the same function as the logging module (like info, error, ...) and each function definition would first apply a regex to filter out AWS keys, and then call the actual logging method behind the scenes. Then each project can be modified to something like import custom_logging_module as logging and none of the logging calls need to be modified. The drawback of this approach though is that it looks like every logging call comes from this module in the log, so you can't track where your messages originate from.
Not sure in what form yet, but it sounds like it would be possible to implement a custom Logger or LogRecord and register it when initializing the logging. This wouldn't have the problems of the previous approach.

I have done some research on approach #3 but couldn't really find a way to do this. Does anyone have experience applying some custom processing on logging messages that would apply to this use case?

Vinay Sajip · Accepted Answer

You could use a custom LogRecord class to achieve this, as long as you could identify keys in text unambiguously. For example:

import logging
import re

KEY = 'PK_SOME_PUBLIC_KEY'
SECRET_KEY = 'SK_SOME_PRIVATE_KEY'

class StrippingLogRecord(logging.LogRecord):

    pattern = re.compile(r'\b[PS]K_\w+\b', re.I)

    def getMessage(self):
        message = super(StrippingLogRecord, self).getMessage()
        message = self.pattern.sub('-- key redacted --', message)
        return message

if hasattr(logging, 'setLogRecordFactory'):
    # 3.x has this
    logging.setLogRecordFactory(StrippingLogRecord)
else:
    # 2.x needs monkey-patching
    logging.LogRecord = StrippingLogRecord

logging.basicConfig(level=logging.DEBUG)
logging.debug('Message with a %s', KEY)
logging.debug('Message with a %s', SECRET_KEY)

In my example I've assumed you could use a simple regex to spot keys, but a more sophisticated alternative method could be used if that's not workable.

Note that the above code should be run before any of the code which logs keys.

Custom Processing on Python logging messages in a generic way

Answers (1)

Related Questions