securisec
securisec

Reputation: 4031

Python logging into file as a dictionary or JSON

I am trying to set up logging where I can log in both stdout and on to a file. This i have accomplished using the following code:

logging.basicConfig(
        level=logging.DEBUG, format='%(asctime)-15s %(levelname)-8s %(message)s',
        datefmt='%a, %d %b %Y %H:%M:%S', handlers=[logging.FileHandler(path), logging.StreamHandler()])

The output of this something like this:

2018-05-02 18:43:33,295 DEBUG    Starting new HTTPS connection (1): google.com
2018-05-02 18:43:33,385 DEBUG    https://google.com:443 "GET / HTTP/1.1" 301 220
2018-05-02 18:43:33,389 DEBUG    Starting new HTTPS connection (1): www.google.com
2018-05-02 18:43:33,490 DEBUG    https://www.google.com:443 "GET / HTTP/1.1" 200 None

What I am trying to accomplish is logging this output to a file not as it is printing to stdout, but as a dictionary or JSON object similar to something like this (while keeping the stdout as it is at the moment):

[{'time': '2018-05-02 18:43:33,295', 'level': 'DEBUG', 'message': 'Starting new HTTPS connection (1): google.com'}, {...}, {...}]

Is this doable? I understand that I can post process this log file after my process is finished, but I am looking for a more elegant solution because certain things i am logging are quite big objects themselves.

Upvotes: 54

Views: 107759

Answers (9)

Jean-Francois T.
Jean-Francois T.

Reputation: 12930

You could also design a simple formatter like this one (which basically create a dictionary and serialises it to JSON format).

Then adding a handler dumping to a file with this formatter and a regular handler to print in the console and you are good to go.

Here is an example where I want to load all data in a Pandas DataFrame after execution:

import json
import logging
from logging import LogRecord

import pandas as pd


class JsonFormatter(logging.Formatter):
    """Formatter to dump error message into JSON"""

    def format(self, record: LogRecord) -> str:
        record_dict = {
            "level": record.levelname,
            "date": self.formatTime(record),
            "message": record.getMessage(),
            "module": record.module,
            "lineno": record.lineno,
        }
        return json.dumps(record_dict)



logger = logging.getLogger()
logger.setLevel(logging.INFO)

json_h = logging.FileHandler("errors.ndjson", mode="w")
json_h.setFormatter(JsonFormatter())
logger.addHandler(json_h)

stream_h = logging.StreamHandler()
stream_h.setFormatter(logging.Formatter("%(message)s"))
logger.addHandler(stream_h)


logger.error("""some error with "double quotes" and 'single quotes'""")
logger.critical("some critical text")
logger.warning("some warning")
logger.info("some info")

df = pd.read_json("errors.ndjson", lines=True)
print(df.dtypes)  # type: ignore
print(df)

You coud also use the dedicated module Python JSON Logger:

...
from pythonjsonlogger import jsonlogger


log_format = "%(asctime)s - %(module)s#%(lineno)d - %(levelname)s - %(message)s"
formatter = jsonlogger.JsonFormatter(
    fmt=log_format, rename_fields={"levelname": "level", "asctime": "date"}
)

logger = logging.getLogger()

json_h = logging.FileHandler("errors.ndjson", mode="w")
json_h.setFormatter(formatter)
logger.addHandler(json_h)
...

Note that the simple implementation has the following limitations compared to python-json-logger:

  • a hard-coded choice of attributes (but this could be parametrized via an argument of the Formatter class)
  • no support of Traceback / Exception (i.e. exc_info)
  • does not display extra information (like for example in logger.info("message", extra={"field": "value"})

Output:

{"level": "ERROR", "date": "2023-09-08 15:05:26,816", "message": "some error with \"double quotes\" and 'single quotes'", "module": "try_json_logger", "lineno": 50}
{"level": "CRITICAL", "date": "2023-09-08 15:05:26,816", "message": "some critical text", "module": "try_json_logger", "lineno": 51}
{"level": "WARNING", "date": "2023-09-08 15:05:26,816", "message": "some warning", "module": "try_json_logger", "lineno": 52}
{"level": "INFO", "date": "2023-09-08 15:05:26,816", "message": "some info", "module": "try_json_logger", "lineno": 53}

Upvotes: 3

jamburg
jamburg

Reputation: 11

I wanted to save full LogRecord object, so I can later inspect my log with maximum of integration with module. So I inspected object like that:

from logging import Handler, LogRecord, getLogger

class HandlerJson(Handler):

    def emit(self, record: LogRecord) -> None:
        json_data = {}
        for attr in filter(lambda attr: not attr.endswith("__"), dir(record)):
            json_data[attr] = record.__getattribute__(attr)
        del json_data["getMessage"]
        print(json_data)

This is subclass of Handler, emit is a rewritten method that is been called with every LogRecord. Dir returns all attributes and methods of the object. I am excluding special methods, and also deleting getMessage method, whitch is not needed for json object representation.

This can be nicely integrated in logging like that:

logger = getLogger(__name__)
logger.setLevel(DEBUG)
handle_json = HandlerJson()
logger.addHandler(handle_json)

logger.info("my info")

result looks like this:

{
'args': (),
'created': 1639925351.0648422,
'exc_info': None,
'exc_text': None,
'filename': 'my_logging.py',
'funcName': 'restore_log_from_disk',
'levelname': 'INFO',
'levelno': 20,
'lineno': 142,
'module': 'my_logging',
'msecs': 64.84222412109375,
'msg': 'my info',
'name': '__main__',
'pathname': 
'/home/jindrich/PycharmProjects/my_logging.py',
'process': 146331,
'processName': 'MainProcess',
'relativeCreated': 1.6417503356933594,
'stack_info': None,
'thread': 140347192436544,
'threadName': 'MainThread'
}

Then you can load data from disk and recreat objects after some digging in doc.

Upvotes: 0

securisec
securisec

Reputation: 4031

So based on @abarnert, i found this Link which provided a good path to making this concept work for the most part. The code as it stands is:

logger=logging.getLogger()
logger.setLevel(logging.DEBUG)

file_handler=logging.FileHandler('foo.log')
stream_handler=logging.StreamHandler()

stream_formatter=logging.Formatter(
    '%(asctime)-15s %(levelname)-8s %(message)s')
file_formatter=logging.Formatter(
    "{'time':'%(asctime)s', 'name': '%(name)s', \
    'level': '%(levelname)s', 'message': '%(message)s'}"
)

file_handler.setFormatter(file_formatter)
stream_handler.setFormatter(stream_formatter)

logger.addHandler(file_handler)
logger.addHandler(stream_handler)

Although it does not fully meet the requirement, it doesnt require any pre processing, and allows me to create two log handlers.

Afterwards, i can use something like:

from ast import literal_eval

with open('foo.log') as f:
    for line in f:
        for key, value in literal_eval(line):
            do something ...

to pull dict objects instead of fighting with improperly formatted JSON to accomplish what I had set out to accomplish.

Still am hoping for a more elegant solution.

Upvotes: 17

David G.
David G.

Reputation: 43

I wanted to have JSON output, so that I could handle it nicer in Promtail and Loki. I just updated the formatter in my logging.json, which I use as a dictConfig

"formatters": {
    "normalFormatter": {
        "format": "{\"time\": \"%(asctime)s\", \"name\": \"[%(name)s]\", \"levelname\": \"%(levelname)s\", \"message\": \"%(message)s\"}"
    }
}

Load the config and get the root logger like:

import json
import logging


# setup logger
with open("logging.json") as f:
    config_dict = json.load(f)
    logging.config.dictConfig(config_dict)

# get root logger
logger = logging.getLogger(__name__)

If you want to use dictConfig, please make sure you gave all necessary fields.

https://docs.python.org/3/library/logging.config.html

You may want to define a formatter, a handler (which is using the formatter) and the logger (which is using the handler)

e.g. logging.json

{
"version": 1,
"disable_existing_loggers": false,
"formatters": {
    "normalFormatter": {
        "format": "{\"time\": \"%(asctime)s\", \"name\": \"[%(name)s]\", \"levelname\": \"%(levelname)s\", \"message\": \"%(message)s\"}"
    }
},
"handlers": {
    "demohandler": {
        "level": "INFO",
        "formatter": "normalFormatter",
        "class": "logging.handlers.TimedRotatingFileHandler",
        "filename": "./files/logs/YourLogFile.log",
        "when": "d",
        "interval": 30,
        "backupCount": 4,
        "utc": true
    }
},
"loggers": {
    "root": {
        "handlers": ["demohandler"],
        "level": "INFO"
    },
    "someModule": {
        "handlers": ["demohandler"],
        "level": "INFO",            
        "propagate": 0
    }        
}
}

Upvotes: 4

Olivier Desnoë
Olivier Desnoë

Reputation: 51

I could achieve this result using this custom formatter:

import json
import logging


class CustomJsonFormatter(logging.Formatter):
    def format(self, record: logging.LogRecord) -> str:
        super(CustomJsonFormatter, self).format(record)
        output = {k: str(v) for k, v in record.__dict__.items()}
        return json.dumps(output)


cf = CustomJsonFormatter()
sh = logging.StreamHandler()
sh.setFormatter(cf)

logger = logging.getLogger("my.module")
logger.addHandler(sh)

# simple json output
logger.warning("This is a great %s!", "log")
# enrich json output
logger.warning("This is an even greater %s!", "log", extra={'foo': 'bar'})

Output:

{"name": "my.module", "msg": "This is a great %s!", "args": "('log',)", "levelname": "WARNING", "levelno": "30", "pathname": "/Users/olivier/test.py", "filename": "test.py", "module": "test", "exc_info": "None", "exc_text": "None", "stack_info": "None", "lineno": "20", "funcName": "<module>", "created": "1661868378.5048351", "msecs": "504.8351287841797", "relativeCreated": "1.3060569763183594", "thread": "4640826880", "threadName": "MainThread", "processName": "MainProcess", "process": "81360", "message": "This is a great log!"}
{"name": "my.module", "msg": "This is an even greater %s!", "args": "('log',)", "levelname": "WARNING", "levelno": "30", "pathname": "/Users/olivier/test.py", "filename": "test.py", "module": "test", "exc_info": "None", "exc_text": "None", "stack_info": "None", "lineno": "22", "funcName": "<module>", "created": "1661868378.504962", "msecs": "504.9619674682617", "relativeCreated": "1.4328956604003906", "thread": "4640826880", "threadName": "MainThread", "processName": "MainProcess", "process": "81360", "foo": "bar", "message": "This is an even greater log!"}

Upvotes: 5

Avi
Avi

Reputation: 1594

I was able to create it using python-json-logger lib, it was simple and really easily to use.

Django


from pythonjsonlogger import jsonlogger

##This is to add custom keys
class CustomJsonFormatter(jsonlogger.JsonFormatter):
    def add_fields(self, log_record, record, message_dict):
        super(CustomJsonFormatter, self).add_fields(log_record, record, message_dict)
        log_record['level'] = record.levelname


# Logging
LOGGING = {
    'version': 1,
    'disable_existing_loggers': False,
    'formatters': {
        'json': {
            '()': CustomJsonFormatter, # if you want to use custom logs class defined above
            # '()': jsonlogger.JsonFormatter, # without custom logs
            'format': '%(level)s %(levelname)s %(asctime)s %(module)s %(process)d %(thread)d %(message)s %(threadName)s %(special)s %(run)s\
                %(name)s %(created)s %(processName)s %(relativeCreated)d %(funcName)s %(levelno)d %(msecs)d %(pathname)s %(lineno)d %(filename)s'
        },
    },
    'handlers': {
        'null': {
            'class': 'logging.NullHandler',
        },
        'console': {
            'level': 'DEBUG',
            'class': 'logging.StreamHandler',
            'formatter': 'json' 
        },
    },
    .....
}

Flask

from logging.config import dictConfig
from pythonjsonlogger import jsonlogger
import os

# This will set global root logging config across all the modules using in the app

##This is to add custom keys
class CustomJsonFormatter(jsonlogger.JsonFormatter):
    def add_fields(self, log_record, record, message_dict):
        super(CustomJsonFormatter, self).add_fields(log_record, record, message_dict)
        log_record['level'] = record.levelname


def setup():
    LOG_FILE = '/tmp/app/app.json'
    if not os.path.exists(os.path.dirname(LOG_FILE)):  # if LOG_FILE dir doesn't exist, creates it.
        os.makedirs(os.path.dirname(LOG_FILE))
    dictConfig({
        'version': 1,
        'formatters': {
            'default': {
                'format': '%(asctime)s - %(module)s - %(levelname)s - %(message)s',
            },
            'json': {
            '()': CustomJsonFormatter, # if you want to use custom logs class defined above
            # '()': jsonlogger.JsonFormatter, # without custom logs
            'format': '%(level)s %(levelname)s %(asctime)s %(module)s %(process)d %(thread)d %(message)s %(threadName)s %(special)s %(run)s\
                %(name)s %(created)s %(processName)s %(relativeCreated)d %(funcName)s %(levelno)d %(msecs)d %(pathname)s %(lineno)d %(filename)s'
        },
        },
        'handlers': {'file': {
            'class': 'logging.handlers.RotatingFileHandler',
            'filename': LOG_FILE,
            'maxBytes': 10485760,
            'backupCount': 5,
            'formatter': 'json'
        },
        'console':{
            'class':'logging.StreamHandler',
            'formatter': 'json'
        }},
        'root': {
            'level': 'INFO',
            'handlers': ['file', 'console']
        }
    })

Hope this helps easy way setup.

Upvotes: 6

Bogdan Mircea
Bogdan Mircea

Reputation: 947

I too dealt with this and I personally believe that an external library might be an overkill for something like this.

I studied a bit the code behind logging.Formatter and came up with a subclass which in my case does the trick (my goal was to have a JSON file that Filebeat can read to further log into ElasticSearch).

Class:

import logging
import json


class JsonFormatter(logging.Formatter):
    """
    Formatter that outputs JSON strings after parsing the LogRecord.

    @param dict fmt_dict: Key: logging format attribute pairs. Defaults to {"message": "message"}.
    @param str time_format: time.strftime() format string. Default: "%Y-%m-%dT%H:%M:%S"
    @param str msec_format: Microsecond formatting. Appended at the end. Default: "%s.%03dZ"
    """
    def __init__(self, fmt_dict: dict = None, time_format: str = "%Y-%m-%dT%H:%M:%S", msec_format: str = "%s.%03dZ"):
        self.fmt_dict = fmt_dict if fmt_dict is not None else {"message": "message"}
        self.default_time_format = time_format
        self.default_msec_format = msec_format
        self.datefmt = None

    def usesTime(self) -> bool:
        """
        Overwritten to look for the attribute in the format dict values instead of the fmt string.
        """
        return "asctime" in self.fmt_dict.values()

    def formatMessage(self, record) -> dict:
        """
        Overwritten to return a dictionary of the relevant LogRecord attributes instead of a string. 
        KeyError is raised if an unknown attribute is provided in the fmt_dict. 
        """
        return {fmt_key: record.__dict__[fmt_val] for fmt_key, fmt_val in self.fmt_dict.items()}

    def format(self, record) -> str:
        """
        Mostly the same as the parent's class method, the difference being that a dict is manipulated and dumped as JSON
        instead of a string.
        """
        record.message = record.getMessage()
        
        if self.usesTime():
            record.asctime = self.formatTime(record, self.datefmt)

        message_dict = self.formatMessage(record)

        if record.exc_info:
            # Cache the traceback text to avoid converting it multiple times
            # (it's constant anyway)
            if not record.exc_text:
                record.exc_text = self.formatException(record.exc_info)

        if record.exc_text:
            message_dict["exc_info"] = record.exc_text

        if record.stack_info:
            message_dict["stack_info"] = self.formatStack(record.stack_info)

        return json.dumps(message_dict, default=str)

Usage:

The formatter must simply be passed to the logging handler.

    json_handler = FileHandler("foo.json")
    json_formatter = JsonFormatter({"level": "levelname", 
                                    "message": "message", 
                                    "loggerName": "name", 
                                    "processName": "processName",
                                    "processID": "process", 
                                    "threadName": "threadName", 
                                    "threadID": "thread",
                                    "timestamp": "asctime"})
    json_handler.setFormatter(json_formatter)

Explanation :

While the logging.Formatter takes a string which it interpolates to output the formatted log record, the JsonFormatter takes a dictionary where the key will be the key of the logged value in the JSON string and the value is a string corresponding to an attribute of the LogRecord that can be logged. (List available in the docs here)

Main "problem" would be parsing dates and timestamps, and the default formatter implementation has these class attributes, default_time_format and default_msec_format.

default_msec_format gets passed to time.strftime() and default_msec_format is interpolated to append miliseconds as time.strftime() doesn't provide formatting options for those.

The principle is that those are now instance attributes which can be provided in the form of time_format and msec_format to customize how the parent's class (unchanged, as it's not overwritten) formatTime() method behaves.

You could technically override it if you want to customize time formatting, but I personally found that using something else would either be redundant or limit the actual formatting options. But feel free to adjust as to your needs.

Output:

An example JSON record logged by the formatting options above, with the default time formatting options set in the class, would be:

{"level": "INFO", "message": "Starting service...", "loggerName": "root", "processName": "MainProcess", "processID": 25103, "threadName": "MainThread", "threadID": 4721200640, "timestamp": "2021-12-04T08:25:07.610Z"}

Upvotes: 46

deesolie
deesolie

Reputation: 1062

If you don't mind pip installing a module to help, there is the json_log_formatter module. The json output does have more attributes than requested. The repo mentions customizing the attributes outputted, but I have not yet integrated that into a working example just yet.

json_log_formatter

import logging
import json_log_formatter

# Set Basic Logging 
self.loggers = logging.getLogger(__name__)
self.loggers.setLevel(logging.DEBUG)
self.formatter = logging.Formatter(fmt='%(asctime)-15s %(levelname)-8s %(message)s', datefmt = '%a, %d %b %Y %H:%M:%S')


# Config for JSON File Handler
self.logFileHandler = logging.FileHandler(SOME-PATH, mode='a')
self.fileFormatter = json_log_formatter.VerboseJSONFormatter()
self.logFileHandler.setFormatter(self.fileFormatter)
self.logFileHandler.setLevel(logging.INFO)
self.loggers.addHandler(self.logFileHandler)


# Config for Stream Handler
self.logStreamHandler = logging.StreamHandler()
self.logStreamHandler.setFormatter(self.formatter)
self.logStreamHandler.setLevel(logging.INFO)
self.loggers.addHandler(self.logStreamHandler)

Upvotes: 1

Kobe Janssens
Kobe Janssens

Reputation: 318

With this code you can add the full traceback, timestamp and level to a json file of choice.

import json
import traceback
from datetime import datetime

def addLogging(logDict:dict):
    loggingsFile = 'loggings.json'

    with open(loggingsFile) as f:
        data = json.load(f)

    data.append(logDict)

    with open(loggingsFile, 'w') as f:
        json.dump(data, f)

def currentTimeUTC():
    return datetime.now().strftime('%d/%m/%Y %H:%M:%S')

try:
    print(5 / 0)
except ZeroDivisionError:
    fullTraceback = str(traceback.format_exc())
    addLogging({'timestamp': currentTimeUTC(), 'level': 'error', 'traceback': fullTraceback})

Output:

[
    {
        "timestamp": "09/06/2020 17:38:00",
        "level": "error",
        "traceback": "Traceback (most recent call last):\n  File \"d:testFullTraceback.py\", line 19, in <module>\n    print(5/0)\nZeroDivisionError: division by zero\n"
    }
]

Upvotes: 2

Related Questions