perl Regular Expression to find Java StackTrace by keyword

Question

I need to grep full stacktrace from logfile by keyword.

This code works fine, but to slow on big files (more than file the slower). I think the best way to improve regex to find keyword, but I could not get it done.

#!/usr/bin/perl

use strict;
use warnings;

my $regexp;
my $stacktrace;
undef $/;

$regexp = shift;
$regexp = quotemeta($regexp);

while (<>) {
  while ( $_ =~ /(?^[E|W|D|I])\s
                 (?\d{6}\s\d{6}\.\d{3})\s
                 (?.*?)/
                 (?.*?)\s-\s
                 (?.*?[
|
](?=^[[E|W|D|I]\s\d{6}\s\d{6}\.\d{3}]?))/gsmx ) {
    $stacktrace = $&;
    if ( $+{MESSAGE} =~ /$regexp/ ) {
      print "$stacktrace";
    }
  }
}

Usage: ./grep_log4j.pl

Example: ./grep_log4j.pl Exception sample.log

I think problem in $stacktrace = $&; because if remove this string and simply print the all matching lines script works fast. Version of script to print all matches:

#!/usr/bin/perl

use strict;
use warnings;

undef $/;

while (<>) {
  while ( $_ =~ /(?^[E|W|D|I])\s
                 (?\d{6}\s\d{6}\.\d{3})\s
                 (?.*?)/
                 (?.*?)\s-\s
                 (?.*?[
|
](?=^[[E|W|D|I]\s\d{6}\s\d{6}\.\d{3}]?))/gsmx ) {
    print_result();
  }
}

sub print_result {
    print "LEVEL: $+{LEVEL}
";
    print "TIMESTAMP: $+{TIMESTAMP}
";
    print "THREAD: $+{THREAD}
";
    print "CLASS: $+{CLASS}
";
    print "MESSAGE: $+{MESSAGE}
";
}

Usage: ./grep_log4j.pl

Example: ./grep_log4j.pl sample.log

Lo4j pattern: %-1p %d %t/%c{1} - %m%n

Example of logfile:

I 111012 141506.000 thread/class - Received message: something
E 111012 141606.000 thread/class - Failed handling mobile request
java.lang.NullPointerException
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:710)
  at java.lang.Thread.run(Thread.java:619)
W 111012 141706.000 thread/class - Received message: something
E 111012 141806.000 thread/class - Failed with Exception
java.lang.NullPointerException
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:710)
  at java.lang.Thread.run(Thread.java:619)
D 111012 141906.000 thread/class - Received message: something
S 111012 142006.000 thread/class - Received message: something
I 111012 142106.000 thread/class - Received message: something
I 111013 142206.000 thread/class - Metrics:0/1

My regex you can find on http://gskinner.com/RegExr/ by log4j keyword:

ErikR · Accepted Answer

You are using:

$/ = undef;

This makes perl read the entire file into memory.

I would process this file line-by-line like this (assuming the stack trace is associated with the message above the trace):

my $matched;
while (<>) {
  if (m/^(?\S+) \s+ (?(\d+) \s+ ([\d.])+) \s+ (?\S+) \s+ - \s+ (?.*)/x) {
    my %captures = %+;
    $matched = ($+{REST} =~ $regexp);
    if ($matched) {
      print "LEVEL: $captures{LEVEL}
";
      ...
    }
  } elsif ($matched) {
    print;
  }
}

Here is a general technique for parsing multi-line blocks. The following loop reads STDIN one line at a time and feeds complete blocks of the log file to the subroutine process:

my $first;
my $stack = "";
while () {
  if (m/^\S /) {
    process($first, $stack) if $first;
    $first = $_;
    $stack = "";
  } else {
    $stack .= $_;
  }
}
process($first, $stack) if $first;

sub process {
  my ($first, $stack) = @_;
  # ... do whatever you want here ...
}

perl Regular Expression to find Java StackTrace by keyword

Answers (2)

Related Questions