Leaving DEBUG logging levels in production code

Question

In the past, my client would call to tell me that a problem occurred in their software. I would log in to the site, and look at the error logs.

However, I found that error logs tend to explain WHAT error occurred but not WHY the error occurred. An understanding of what led to the issue requires previous state information, which is only contained in DEBUG logs.

So almost every time, I would have to change the log level, restart the software, and spend a lot of time trying to recreate the issue.

I decided to leave the production code running in DEBUG log level, but with one adjustment: I capped the max journal size using journald.conf to 10GB. On a 500GB machine this seemed fine to me.

Now I can use journalctl --since and journalctl --until to filter the huge log to the time period when my client said the error occurred.

And now I don't waste time re-creating the issue when problems come up.

My question:

What are the implications of leaving production code running on a client's site in a verbose DEBUG level?

I found the answer here inadequate: Log levels in Production

Matthew · Accepted Answer

The two main issues with always logging every log message are:

Performance

Logging is not free in terms of I/O, memory, or CPU. This is apparent on both the system writing the logs, and the system(s) collecting and querying.

Noise

If 99.9% of your log lines contains information irrelevant to a problem, then when an actual problem occurs, it may be difficult to find it.

Potential Solution

Have request level logging, I have seen systems that will record debug level logs in memory for a specific request, and will only persist them to disk if an application error happens on the given request.

This might not be easy to implement on your given system, and will still incur some penalty of processing all these logs and maintaining them in memory, but it could be a good option to solve your particular problem.

Alternatively, update your existing logs to contain more contextual information. For example, the default .NET Core logging framework allows you to add contextual information along the call chain:

using (logger.BeginScope("UserId: {0}", 123)) 
{
    // will log this message with contextual information UserId: 123
    logger.LogInformation("Password does not match"); 
}

Thirdly, if you're able to get your system in a state where the problem is reproducible and restarting your application loses that ability, then you can update your logging framework to be able to dynamically set the minimum severity of log messages at runtime.

Leaving DEBUG logging levels in production code

Answers (2)

Log debug only when needed

Performance

Noise

Potential Solution

Related Questions