streetparade
streetparade

Reputation: 32918

How to parse Apache logs using a regex in PHP

I'm trying to split this string in PHP:

11.11.11.11 - - [25/Jan/2000:14:00:01 +0100] "GET /1986.js HTTP/1.1" 200 932 "http://domain.com/index.html" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 GTB6"

How can split this into IP, date, HTTP method, domain-name and browser?

Upvotes: 7

Views: 14877

Answers (4)

recurse
recurse

Reputation: 634

// # Parses the NCSA Combined Log Format lines:
$pattern = '/^([^ ]+) ([^ ]+) ([^ ]+) (\[[^\]]+\]) "(.*) (.*) (.*)" ([0-9\-]+) ([0-9\-]+) "(.*)" "(.*)"$/';

Usage:

if (preg_match($pattern,$yourstuff,$matches)) {

    //# puts each part of the match in a named variable

    list($whole_match, $remote_host, $logname, $user, $date_time, $method, $request, $protocol, $status, $bytes, $referer, $user_agent) = $matches;

}

Upvotes: 2

Gumbo
Gumbo

Reputation: 655785

This log format seems to be the Apache’s combined log format. Try this regular expression:

/^(\S+) \S+ \S+ \[([^\]]+)\] "([A-Z]+)[^"]*" \d+ \d+ "[^"]*" "([^"]*)"$/m

The matching groups are as follows:

  1. remote IP address
  2. request date
  3. request HTTP method
  4. User-Agent value

But the domain is not listed there. The second quoted string is the Referer value.

Upvotes: 14

Daniel S. Sterling
Daniel S. Sterling

Reputation: 1339

Here is some Perl, not PHP, but the regex to use is the same. This regex works to parse everything I've seen; clients can send some bizarre requests:

my ($ip, $date, $method, $url, $protocol, $alt_url, $code, $bytes,
        $referrer, $ua) = (m/
    ^(\S+)\s                    # IP
    \S+\s+                      # remote logname
    (?:\S+\s+)+                 # remote user
    \[([^]]+)\]\s               # date
    "(\S*)\s?                   # method
    (?:((?:[^"]*(?:\\")?)*)\s   # URL
    ([^"]*)"\s|                 # protocol
    ((?:[^"]*(?:\\")?)*)"\s)    # or, possibly URL with no protocol
    (\S+)\s                     # status code
    (\S+)\s                     # bytes
    "((?:[^"]*(?:\\")?)*)"\s    # referrer
    "(.*)"$                     # user agent
/x);
die "Couldn't match $_" unless $ip;
$alt_url ||= '';
$url ||= $alt_url;

Upvotes: 4

KARASZI István
KARASZI István

Reputation: 31477

You should check out a regular expression tutorial. But here is the answer:

if (preg_match('/^(\S+) \S+ \S+ \[(.*?)\] "(\S+).*?" \d+ \d+ "(.*?)" "(.*?)"/', $line, $m)) {
  $ip = $m[1];
  $date = $m[2];
  $method = $m[3];
  $referer = $m[4];
  $browser = $m[5];
}

Take care, it's not the domain name in the log but the HTTP referer.

Upvotes: 4

Related Questions