Reputation: 23
I need extract the following string in Python to constitute a dictionary:
2014:02:02-12:24:17 NAMETEST ulogd[4834]: id="xxxx" severity="xxxx" sys="xxxx" sub="xxxx" name="xxxx aaaa" action="xxxx" fwrule="xxxx" outitf="xxxx" srcmac="xxxx" srcip="xxxx" dstip="xxxx" proto="x" length="xxxx" tos="xxxx" prec="xxxx" ttl="xx" srcport="xxxx" dstport="xxxx" tcpflags="xxxx"
I do not use split(' ')
with space, because for example, the field name="xxxx aaaa"
can contain a space.
first with the following regex I have extracted the data only:
re.findall('"([^"]*)"', line)
But now I need to used an dictionary format like: line['id'] = 1111
.
So the regex? Have you an idea?
Upvotes: 0
Views: 2923
Reputation: 473853
You can use re.findall()
to find the key value pairs:
>>> import re
>>> groups = re.findall(r'(\w+)="(.*?)"', s)
>>> line = dict(groups)
>>>
>>> from pprint import pprint
>>> pprint(line)
{'action': 'xxxx',
'dstip': 'xxxx',
'dstport': 'xxxx',
'fwrule': 'xxxx',
'id': 'xxxx',
'length': 'xxxx',
'name': 'xxxx aaaa',
'outitf': 'xxxx',
'prec': 'xxxx',
'proto': 'x',
'severity': 'xxxx',
'srcip': 'xxxx',
'srcmac': 'xxxx',
'srcport': 'xxxx',
'sub': 'xxxx',
'sys': 'xxxx',
'tcpflags': 'xxxx',
'tos': 'xxxx',
'ttl': 'xx'}
(\w+)="(.*?)"
would match one or more alphanumeric characters (the \w+
part), followed by ="
, followed by any characters (.*?
, non-greedy), followed by "
. Parenthesis here define capturing groups.
Upvotes: 2