Reputation: 591
I need a regex for parsing Apache files
For example:
Here is a portion of a /var/log/httpd/error_log
[Sun Sep 02 03:34:01 2012] [notice] Digest: done
[Sun Sep 02 03:34:01 2012] [notice] Apache/2.2.15 (Unix) DAV/2 mod_ssl/2.2.15 OpenSSL/1.0.0- fips SVN/1.6.11 configured -- resuming normal operations
[Sun Sep 02 03:34:01 2012] [error] avahi_entry_group_add_service_strlst("localhost") failed: Invalid host name
[Sun Sep 02 08:01:14 2012] [error] [client 216.244.73.194] File does not exist: /var/www/html/manager
[Sun Sep 02 11:04:35 2012] [error] [client 58.218.199.250] File does not exist: /var/www/html/proxy
I want a regex that includes space as delimiter and excludes embedded space. And the apache error log format alternates between
[DAY MMM DD HH:MM:SS YYYY] [MSG_TYPE] DESCRIPTOR: MESSAGE
[DAY MMM DD HH:MM:SS YYYY] [MSG_TYPE] [SOURCE IP] ERROR: DETAIL
I created 2 Regexes, 1st one is
^(\[[\w:\s]+\]) (\[[\w]+\]) (\[[\w\d.\s]+\])?([\w\s/.(")-]+[\-:]) ([\w/\s]+)$
This one is simple and just match the contents as it is
I want something like the following Regex which I created
(?<=|\s)([\w:\S]+)
This one doesn't give me the desired output, it doesn't include embedded space. So I need a regex which groups each field, includes embedded space and uses space as delimiter. Pls Help me out with the logic!!!!
my code
void regexparser( CharBuffer cb)
{ try{
Pattern linePattern = Pattern.compile(".*\r?\n");
Pattern csvpat = Pattern.compile( "^\\[([\\w:\\s]+)\\] \\[([\\w]+)\\] (\\[([\\w\\d.\\s]+)\\])?([\\w\\s/.(\")-]+[\\-:]) ([\\w/\\s].+)",Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE | Pattern.MULTILINE);
Matcher lm = linePattern.matcher(cb);
Matcher pm = null;
while(lm.find())
{ //System.out.print("1st loop");
CharSequence cs = lm.group();
if (pm==null)
pm = csvpat.matcher(cs);
else
pm.reset(cs);
while(pm.find())
{ // System.out.println("2nd loop");
//System.out.println(pm.groupCount());
//CharSequence ps = pm.group();
//System.out.print(ps);
if(pm.group(4)==null)
System.out.println(pm.group(1)+" "+pm.group(2)+" "+pm.group(5)+" "+pm.group(6));
else
System.out.println(pm.group(1)+" "+pm.group(2)+" "+pm.group(4)+" "+pm.group(5)+" "+pm.group(6));
Upvotes: 0
Views: 701
Reputation: 2668
I agree that this task should be done with an existing solution to parse Apache logs.
However, if you want to try something out for training purposes, maybe you want to start with this. Instead of parsing everything in one single huge regex, I do it in small steps that are much better readable:
#!/usr/bin/env perl
use strict;
use warnings;
use DateTime::Format::Strptime;
use feature 'say';
# iterate log lines
while (defined(my $line = <DATA>)) {
chomp $line;
# prepare
my %data;
my $strp = DateTime::Format::Strptime->new(
pattern => '%a %b %d %H:%M:%S %Y',
);
# consume date/time
next unless $line =~ s/^\[(\w+ \w+ \d+ \d\d:\d\d:\d\d \d{4})\] //;
$data{date} = $strp->parse_datetime($1);
# consume message type
next unless $line =~ s/^\[(\w+)\] //;
$data{type} = $1;
# "[source ip]" alternative
if ($line =~ s/^\[(\w+) ([\d\.]+)\] //) {
@data{qw(source ip)} = ($1, $2);
# consume "error: detail"
next unless $line =~ s/([^:]+): (.*)//;
@data{qw(error detail)} = ($1, $2);
}
# "descriptor: message" alternative
elsif ($line =~ s/^([^:]+): (.*)//) {
@data{qw(descriptor message)} = ($1, $2);
}
# invalid
else {
next;
}
# something left: invalid
next if length $line;
# parsed ok: output
say "$_: $data{$_}" for keys %data;
say '-' x 40;
}
__DATA__
[Sun Sep 02 03:34:01 2012] [notice] Digest: done
[Sun Sep 02 03:34:01 2012] [notice] Apache/2.2.15 (Unix) DAV/2 mod_ssl/2.2.15 OpenSSL/1.0.0- fips SVN/1.6.11 configured -- resuming normal operations
[Sun Sep 02 03:34:01 2012] [error] avahi_entry_group_add_service_strlst("localhost") failed: Invalid host name
[Sun Sep 02 08:01:14 2012] [error] [client 216.244.73.194] File does not exist: /var/www/html/manager
[Sun Sep 02 11:04:35 2012] [error] [client 58.218.199.250] File does not exist: /var/www/html/proxy
descriptor: Digest
date: 2012-09-02T03:34:01
type: notice
message: done
----------------------------------------
descriptor: avahi_entry_group_add_service_strlst("localhost") failed
date: 2012-09-02T03:34:01
type: error
message: Invalid host name
----------------------------------------
detail: /var/www/html/manager
source: client
ip: 216.244.73.194
date: 2012-09-02T08:01:14
error: File does not exist
type: error
----------------------------------------
detail: /var/www/html/proxy
source: client
ip: 58.218.199.250
date: 2012-09-02T11:04:35
error: File does not exist
type: error
----------------------------------------
Note that according to your format description, the second line is invalid and ignored by the program.
Upvotes: 1