nahaelem
nahaelem

Reputation: 133

Spliting a sever access log string

i have been trying for some time now to split this server access log string, but to no avail.The string comes in this format

10.223.157.186 - - [15/Jul/2009:15:50:35 -0700] "GET /assets/css/reset.css HTTP/1.1" 200 1014

i have used `str.split() and str("\t") but to no avail. thanks for your help

Upvotes: 0

Views: 83

Answers (1)

Chrispresso
Chrispresso

Reputation: 4070

You need to use a regular expression

For instance if you had this:

import re
line = '10.223.157.186 - - [15/Jul/2009:15:50:35 -0700] "GET /assets/css/reset.css HTTP/1.1" 200 1014'
regexString = r'(?P<IP>[0-9.]+) (?P<ID>[\w-]+) (?P<user>[\w-]+) (?P<time>\[.*\]) (?P<request>".*") (?P<status>\d+) (?P<size>\d+)'  
regex = re.compile(regexString)
match = regex.match(line)
if match != None:
    ip = match.group('IP')
    id = match.group('ID')
    # etc.

If you want to extract each thing from the time, i.e. the day, month, year, etc. then you can either run another regex on match.group('time') or you can be more explicit in the regexString about how to parse it.
For instance instead you could have: \[(?P<day>\d+)/(?P<month>[A-Za-z]+)/(?P<year>\d+):(?P<hour>\d+):(?P<minute>\d+):(?P<second>\d+) -(?P<zone>\d+)\]

This would get you:
regexString = r'(?P<IP>[0-9.]+) (?P<ID>[\w-]+) (?P<user>[\w-]+) \[(?P<day>\d+)/(?P<month>[A-Za-z]+)/(?P<year>\d+):(?P<hour>\d+):(?P<minute>\d+):(?P<second>\d+) -(?P<zone>\d+)\] (?P<request>".*") (?P<status>\d+) (?P<size>\d+)'

Upvotes: 2

Related Questions