Umberto
Umberto

Reputation: 103

Regular Expression for parsing apache error log files

I would need a regular expression to be used in a Java program for parsing apache error files, such as:

[Thu Sep 27 12:08:18 2012] [error] [client 151.10.158.10] File does not exist: /srv/www/htdocs/pad/favicon.ico
[Thu Oct 04 17:02:42 2012] [error] [client 151.10.1.10] File does not exist: > /srv/www/htdocs/pad/favicon.ico
[Wed Oct 17 10:16:40 2012] [error] [client 151.10.14.60] File does not exist: /srv/www/htdocs/pad/sites/all/modules/fckeditor/fckeditor/editor/userfiles, referer: http://pad.sta.uniroma1.it/sites/all/modules/fckeditor/fckeditor/editor/fckeditor.html?InstanceName=edit-body&Toolbar=DrupalFull

I already tried several solutions (some of which have been previously reported on stackoverflow), the one that seems to work better is:

^(\[[\w:\s]+\]) (\[[\w]+\]) (\[[\w\d.\s]+\])?([\w\s/.(")-]+[\-:]) ([\w/\s]+)$

However, it seems to be unable to match strings like:

[Thu May 17 22:41:54 2012] [error] [client 118.238.211.206] Invalid URI in request GET :81/phpmyadmin/scripts/setup.php HTTP/1.1

How do can I fix it?

EDIT I checked all the proposed solutions and, although improving the number of matched lines, all of them are still unable to deal with cases like the following ones:

[Fri Jul 15 00:24:41 2011] [error] [client 219.12.35.141] script '/srv/www/htdocs/pad2/scripts/setup.php' not found or unable to stat
[Mon May 28 18:43:25 2012] [error] [client 88.110.28.25] Invalid URI in request GET HTTP/1.1 HTTP/1.1

Notice also that it would be ok for me to receive in a single group all the data following the square brackets including the client keyword

Upvotes: 3

Views: 4648

Answers (5)

Sergii Lagutin
Sergii Lagutin

Reputation: 10671

receiving the information encoded in the first three [...] groups

Find [...] as longest string starting with [ and ending with ] without other ] symbol between them - \[[^\]]+\]

Rest of line capture as .* - match from current position to end of line.

So your full solution looks like this:

^(\[[^\]]+\]) (\[[^\]]+\]) (\[[^\]]+\]) (.*)$

RegEx demo

Upvotes: 3

Ahosan Karim Asik
Ahosan Karim Asik

Reputation: 3299

 $a="[Thu May 17 22:41:54 2012] [error] [client 118.238.211.206] Invalid URI in request GET :81/phpmyadmin/scripts/setup.php HTTP/1.1\n";
 $a .="[Thu May 17 22:41:54 2012] [error] [client 118.238.211.206] Invalid URI in request GET :81/phpmyadmin/scripts/setup.php HTTP/1.1\n";
 $a .="[Thu May 17 22:41:54 2012] [error] [client 118.238.211.206] Invalid URI in request GET :81/phpmyadmin/scripts/setup.php HTTP/1.1\n";
preg_match_all("/(\[.*\])\s+(\[.*\])\s+(\[.*\])\s+([a-zA-Z0-9\s]+:)\s*(.*)/",$a,$m) ; var_dump($m);

try this ... (out put)

array (size=6)
  0 => 
    array (size=3)
      0 => string '[Thu May 17 22:41:54 2012] [error] [client 118.238.211.206] Invalid URI in request GET :81/phpmyadmin/scripts/setup.php HTTP/1.1' (length=128)
      1 => string '[Thu May 17 22:41:54 2012] [error] [client 118.238.211.206] Invalid URI in request GET :81/phpmyadmin/scripts/setup.php HTTP/1.1' (length=128)
      2 => string '[Thu May 17 22:41:54 2012] [error] [client 118.238.211.206] Invalid URI in request GET : 81/phpmyadmin/scripts/setup.php HTTP/1.1' (length=129)
  1 => 
    array (size=3)
      0 => string '[Thu May 17 22:41:54 2012]' (length=26)
      1 => string '[Thu May 17 22:41:54 2012]' (length=26)
      2 => string '[Thu May 17 22:41:54 2012]' (length=26)
  2 => 
    array (size=3)
      0 => string '[error]' (length=7)
      1 => string '[error]' (length=7)
      2 => string '[error]' (length=7)
  3 => 
    array (size=3)
      0 => string '[client 118.238.211.206]' (length=24)
      1 => string '[client 118.238.211.206]' (length=24)
      2 => string '[client 118.238.211.206]' (length=24)
  4 => 
    array (size=3)
      0 => string 'Invalid URI in request GET :' (length=28)
      1 => string 'Invalid URI in request GET :' (length=28)
      2 => string 'Invalid URI in request GET :' (length=28)
  5 => 
    array (size=3)
      0 => string '81/phpmyadmin/scripts/setup.php HTTP/1.1' (length=40)
      1 => string '81/phpmyadmin/scripts/setup.php HTTP/1.1' (length=40)
      2 => string '81/phpmyadmin/scripts/setup.php HTTP/1.1' (length=40)

Upvotes: 0

anubhava
anubhava

Reputation: 785128

Last segment of your regex doesn't seem right. This simplified regex should work:

^(\[[\w:\s]+\]) (\[[\w]+\]) (\[[\w\d.\s]+\]) ([\s\w/.(")-]+[-:])(.+)$

RegEx Demo

Upvotes: 0

Sly
Sly

Reputation: 361

There is no space after the column in "GET :81"

This one works :

^(\[[\w:\s]+\]) (\[[\w]+\]) (\[[\w\d.\s]+\])?([\w\s\/.(")-]+[\-:])\s?([\w\/\s.]+)

example : http://regex101.com/r/xO1wG2/2

Upvotes: 0

Avinash Raj
Avinash Raj

Reputation: 174706

The below regex would match all the above mentioned error formats.

^(\[[\w:\s]+\]) (\[[\w]+\]) (\[[\w\d.\s]+\])?([\w\s\/.(")-]+[\-:])\s*>?\s*([\w\/\s.]+)(?:\s*,(\s*\w+:)\s*([\w\/.=?:&-]+))?$

DEMO

Upvotes: 0

Related Questions