Reputation: 195
I have a section of a log file that looks like this:
"/log?action=End&env=123&id=8000&cat=baseball"
"/log?action=start&get=3210&rsa=456&key=golf"
I want to parse out each section so the results would look like this:
('/log?action=', 'End', 'env=123', 'id=8000', 'cat=baseball')
('/log?action=', 'start', 'get=3210', 'rsa=456', 'key=golf')
I've looked into regex and matching, but a lot of my logs have different sequences which leads me to believe that it is not possible. Any suggestions?
Upvotes: 0
Views: 681
Reputation: 366053
This is clearly a fragment of a URL, so the best way to parse it is to use URL parsing tools. The stdlib comes with urlparse
, which does exactly what you want.
For example:
>>> import urlparse
>>> s = "/log?action=End&env=123&id=8000&cat=baseball"
>>> bits = urlparse.urlparse(s)
>>> variables = urlparse.parse_qs(bits.query)
>>> variables
{'action': ['End'], 'cat': ['baseball'], 'env': ['123'], 'id': ['8000']}
If you really want to get the format you asked for, you can use parse_qsl
instead, and then join the key-value pairs back together. I'm not sure why you want the /log
to be included in the first query variable, or the first query variable's value to be separate from its variable, but even that is doable if you insist:
>>> variables = urlparse.parse_qsl(s)
>>> result = (variables[0][0] + '=', variables[0][1]) + tuple(
'='.join(kv) for kv in variables[1:])
>>> result
('/log?action=', 'End', 'env=123', 'id=8000', 'cat=baseball')
If you're using Python 3.x, just change the urlparse
to urllib.parse
, and the rest is exactly the same.
Upvotes: 3
Reputation: 14118
It's a bit hard to say without knowing what the domain of possible inputs is, but here's a guess at what will work for you:
log = "/log?action=End&env=123&id=8000&cat=baseball\n/log?action=start&get=3210&rsa=456&key=golf"
logLines = [line.split("&") for line in log.split('\n')]
logLines = [tuple(line[0].split("=")+line[1:]) for line in logLines]
print logLines
OUTPUT:
[('/log?action', 'End', 'env=123', 'id=8000', 'cat=baseball'),
('/log?action', 'start', 'get=3210', 'rsa=456', 'key=golf')]
This assumes that you don't really need the "=" at the end of the first string.
Upvotes: 0
Reputation: 12326
You can split a couple times:
s = '/log?action=End&env=123&id=8000&cat=baseball'
L = s.split("&")
L[0:1]=L[0].split("=")
Output:
['/log?action', 'End', 'env=123', 'id=8000', 'cat=baseball']
Upvotes: 0