Kimball Robinson
Kimball Robinson

Reputation: 3388

How to automatically collapse repetitive log output in log4j

Every once in a while, a server or database error causes thousands of the same stack trace in the server log files. It might be a different error/stacktrace today than a month ago. But it causes the log files to rotate completely, and I no longer have visibility into what happened before. (Alternately, I don't want to run out of disk space, which for reasons outside my control right now is limited--I'm addressing that issue separately). At any rate, I don't need thousands of copies of the same stack trace--just a dozen or so should be enough.

I would like it if I could have log4j/log4j2/another system automatically collapse repetitive errors, so that they don't fill up the log files. For example, a threshold of maybe 10 or 100 exceptions from the same place might trigger log4j to just start counting, and wait until they stop coming, then output a count of how many more times they appeared.

What pre-made solutions exist (a quick survey with links is best)? If this is something I should implement myself, what is a good pattern to start with and what should I watch out for?

Thanks!

Upvotes: 6

Views: 2983

Answers (2)

natetitterton
natetitterton

Reputation: 1290

Log4j's BurstFilter will certainly help prevent you filling your disks. Remember to configure it so that it applies in as limited a section of code as you can, or you'll filter out messages you might want to keep (that is, don't use it on your appender, but on a particular logger that you isolate in your code).

I wrote a simple utility class at one point that wrapped a logger and filtered based on n messages within a given Duration. I used instances of it around most of my warning and error logs to protect the off chance that I'd run into problems like you did. It worked pretty well for my situation, especially because it was easier to quickly adapt for different situations.

Something like:

...

public DurationThrottledLogger(Logger logger, Duration throttleDuration, int maxMessagesInPeriod) {
   ...
}

public void info(String msg) {
    getMsgAddendumIfNotThrottled().ifPresent(addendum->logger.info(msg + addendum));
}


private synchronized Optional<String> getMsgAddendumIfNotThrottled() {
    LocalDateTime now = LocalDateTime.now();
    String msgAddendum;
    if (throttleDuration.compareTo(Duration.between(lastInvocationTime, now)) <= 0) {
        // last one was sent longer than throttleDuration ago - send it and reset everything
        if (throttledInDurationCount == 0) {
            msgAddendum = " [will throttle future msgs within throttle period]";
        } else {
            msgAddendum = String.format(" [previously throttled %d msgs received before %s]",
                    throttledInDurationCount, lastInvocationTime.plus(throttleDuration).format(formatter));
        }
        totalMessageCount++;
        throttledInDurationCount = 0;
        numMessagesSentInCurrentPeriod = 1;
        lastInvocationTime = now;
        return Optional.of(msgAddendum);
    } else if (numMessagesSentInCurrentPeriod < maxMessagesInPeriod) {
        msgAddendum = String.format(" [message %d of %d within throttle period]", numMessagesSentInCurrentPeriod + 1, maxMessagesInPeriod);
        // within throttle period, but haven't sent max messages yet - send it
        totalMessageCount++;
        numMessagesSentInCurrentPeriod++;
        return Optional.of(msgAddendum);
    } else {
        // throttle it
        totalMessageCount++;
        throttledInDurationCount++;
        return emptyOptional;
    }
}

I'm pulling this from an old version of the code, unfortunately, but the gist is there. I wrote a bunch of static factory methods that I mainly used because they let me write a single line of code to create one of these for that one log message:

} catch (IOException e) {
     DurationThrottledLogger.error(logger, Duration.ofSeconds(1), "Received IO Exception. Exiting current reader loop iteration.", e);
}

This probably won't be as important in your case; for us, we were using a somewhat underpowered graylog instance that we could hose down fairly easily.

Upvotes: 3

rgoers
rgoers

Reputation: 9151

Will the BurstFilter do what you want? If not, please create a Jira issue with the algorithm that would work for you and the Log4j team would be happy to consider it. Better yet, if you can provide a patch it would be much more likely to be incorporated.

Upvotes: 4

Related Questions