Reputation: 133
i have been trying for some time now to split this server access log string, but to no avail.The string comes in this format
10.223.157.186 - - [15/Jul/2009:15:50:35 -0700] "GET /assets/css/reset.css HTTP/1.1" 200 1014
i have used `str.split() and str("\t") but to no avail. thanks for your help
Upvotes: 0
Views: 83
Reputation: 4070
You need to use a regular expression
For instance if you had this:
import re
line = '10.223.157.186 - - [15/Jul/2009:15:50:35 -0700] "GET /assets/css/reset.css HTTP/1.1" 200 1014'
regexString = r'(?P<IP>[0-9.]+) (?P<ID>[\w-]+) (?P<user>[\w-]+) (?P<time>\[.*\]) (?P<request>".*") (?P<status>\d+) (?P<size>\d+)'
regex = re.compile(regexString)
match = regex.match(line)
if match != None:
ip = match.group('IP')
id = match.group('ID')
# etc.
If you want to extract each thing from the time, i.e. the day, month, year, etc. then you can either run another regex on match.group('time') or you can be more explicit in the regexString about how to parse it.
For instance instead you could have: \[(?P<day>\d+)/(?P<month>[A-Za-z]+)/(?P<year>\d+):(?P<hour>\d+):(?P<minute>\d+):(?P<second>\d+) -(?P<zone>\d+)\]
This would get you:
regexString = r'(?P<IP>[0-9.]+) (?P<ID>[\w-]+) (?P<user>[\w-]+) \[(?P<day>\d+)/(?P<month>[A-Za-z]+)/(?P<year>\d+):(?P<hour>\d+):(?P<minute>\d+):(?P<second>\d+) -(?P<zone>\d+)\] (?P<request>".*") (?P<status>\d+) (?P<size>\d+)'
Upvotes: 2