Reputation: 61
I have this string bellow on iptables logs, i want parse full content. My actual regex parse 90% but i need the all content logs.
My python regex:
regex = re.compile('([^ ]+)=([^ ]+)')
I need this parameters too:
Aug 13 17:16:33 app-srv01 kernel: newConnection -
Regex Result:
[('IN', 'eth0'), ('MAC', '56:00:01:a1:5c:b7:fe:00:01:a1:5c:b7:08:00'), ('SRC', '91.103.125.80'), ('DST', '45.33.223.166'), ('LEN', '52'), ('TOS', '0x00'), ('PREC', '0x00'), ('TTL', '113'), ('ID', '21200'), ('PROTO', 'TCP'), ('SPT', '55743'), ('DPT', '445'), ('WINDOW', '8192'), ('RES', '0x00'), ('URGP', '0')]
Log String:
Aug 13 17:16:33 app-srv01 kernel: newConnection - IN=eth0 OUT= MAC=56:00:01:a1:5c:b7:fe:00:01:a1:5c:b7:08:00 SRC=91.103.125.80 DST=45.33.223.166 LEN=52 TOS=0x00 PREC=0x00 TTL=113 ID=21200 DF PROTO=TCP SPT=55743 DPT=445 WINDOW=8192 RES=0x00 SYN URGP=0
Output expected:
[('Aug 13 17:16:33'), ('app-srv01 kernel:'), ('newConnection -'),
('IN', 'eth0'), ('MAC', '56:00:01:a1:5c:b7:fe:00:01:a1:5c:b7:08:00'), ('SRC',
'91.103.125.80'), ('DST', '45.33.223.166'), ('LEN', '52'), ('TOS', '0x00'), ('PREC',
'0x00'), ('TTL', '113'), ('ID', '21200'), ('PROTO', 'TCP'), ('SPT', '55743'), ('DPT',
'445'), ('WINDOW', '8192'), ('RES', '0x00'), ('URGP', '0')]
Some can help. I'm using python3 Thanks
Upvotes: 1
Views: 940
Reputation: 163457
If you want the date at the start (and the other 2 are not the most important as in the comments) and you want the matches from your current pattern, you might use an alternation:
^([a-zA-Z]+ \d{1,2} \d{1,2}:\d{1,2}:\d{1,2})|([^ ]+)=([^ ]+)
^
Start of the string([a-zA-Z]+ \d{1,2} \d{1,2}:\d{1,2}:\d{1,2})
Capture group 1, match a "date like" pattern|
Or([^ ]+)=([^ ]+)
Your initial pattern capturing the values in group 2 and group 3For example
import re
regex = r"^([a-zA-Z]+ \d{1,2} \d{1,2}:\d{1,2}:\d{1,2})|([^ ]+)=([^ ]+)"
test_str = "Aug 13 17:16:33 app-srv01 kernel: newConnection - IN=eth0 OUT= MAC=56:00:01:a1:5c:b7:fe:00:01:a1:5c:b7:08:00 SRC=91.103.125.80 DST=45.33.223.166 LEN=52 TOS=0x00 PREC=0x00 TTL=113 ID=21200 DF PROTO=TCP SPT=55743 DPT=445 WINDOW=8192 RES=0x00 SYN URGP=0"
print(list(map(lambda x: tuple(filter(None, x)), re.findall(regex, test_str))))
Result
[('Aug 13 17:16:33',), ('IN', 'eth0'), ('MAC', '56:00:01:a1:5c:b7:fe:00:01:a1:5c:b7:08:00'), ('SRC', '91.103.125.80'), ('DST', '45.33.223.166'), ('LEN', '52'), ('TOS', '0x00'), ('PREC', '0x00'), ('TTL', '113'), ('ID', '21200'), ('PROTO', 'TCP'), ('SPT', '55743'), ('DPT', '445'), ('WINDOW', '8192'), ('RES', '0x00'), ('URGP', '0')]
Upvotes: 0
Reputation: 89574
You can do that with re.split, using a space before a abc=def as separator, then you split a second time each item on the equal sign:
[x.split('=') for x in re.split(r' (?=\S+=)', s)]
Upvotes: 0