Reputation: 11
I have a log file that I am trying to parse. Example of log file is below:
Oct 23 13:03:03.714012 prod1_xyz(RSVV)[201]: #msgtype=EVENT #server=Web/Dev@server1web #func=LKZ_WriteData ( line 2992 ) #rc=0 #msgid=XYZ0064 #reqid=0 #msg=Web Activity end (section 200, # SysD 1, Files 222, Bytes 343422089928, Errors 0, Aborted Files 0, Busy Files 0)
I want to pull out all the text that start with a hash, and have a key and value. For example, #msgtype=EVENT. Any text that has a hash only, and no "=" sign, will be treated as a value.
So in the above log entry, I want a list that looks like this
#msgtype=EVENT
#server=Web/Dev@server1web
#func=LKZ_WriteData ( line 2992 )
#rc=0
#msgid=XYZ0064
#reqid=0
#msg=Web Activity end (section 200, # SysD 1, Files 222, Bytes 343422089928, Errors 0, Aborted Files 0, Busy Files 0) (Notice the hash present in the middle of the text)
I have tried the Python regex findall option, but I am not able to capture all data.
For example:
str='Oct 23 13:03:03.714012 prod1_xyz(RSVV)[201]: #msgtype=EVENT #server=Web/Dev@server1web #func=LKZ_WriteData ( line 2992 ) #rc=0 #msgid=XYZ0064 #reqid=0 #msg=Web Activity end (section 200, # SysD 1, Files 222, Bytes 343422089928, Errors 0, Aborted Files 0, Busy Files 0)'
z = re.findall("(#.+?=.+?)(:?#|$)",str)
print(z)
Output:
[('#msgtype=EVENT ', '#'), ('#func=LKZ_WriteData ( line 2992 ) ', '#'), ('#msgid=XYZ0064 ', '#'), ('#msg=Web Activity end (section 200, ', '#')]
Upvotes: 1
Views: 74
Reputation: 1167
import re
s = "Oct 23 13:03:03.714012 prod1_xyz(RSVV)[201]: #msgtype=EVENT #server=Web/Dev@server1web #func=LKZ_WriteData ( line 2992 ) #rc=0 #msgid=XYZ0064 #reqid=0 #msg=Web Activity end (section 200, # SysD 1, Files 222, Bytes 343422089928, Errors 0, Aborted Files 0, Busy Files 0)"
a = re.findall('#(?=[a-zA-Z]+=).+?=.*?(?= #[a-zA-Z]+=|$)', s)
result = [item.split('=') for item in a]
print(result)
Gives:
[['#msgtype', 'EVENT'], ['#server', 'Web/Dev@server1web'], ['#func', 'LKZ_WriteData ( line 2992 )'], ['#rc', '0'], ['#msgid', 'XYZ0064'], ['#reqid', '0'], ['#msg', 'Web Activity end (section 200, # SysD 1, Files 222, Bytes 343422089928, Errors 0, Aborted Files 0, Busy Files 0)']]
Upvotes: 0
Reputation: 626738
The (:?#|$)
is a capturing group that matches an optional :
and then #
, or end of string. Since re.findall
returns all captured substrings the result is a list of tuples.
You need
re.findall(r'#[^\s=]+=.*?(?=\s*#[^\s=]+=|$)', text)
See the regex demo
Regex details
#[^\s=]+
- #
and then any 1+ chars other than whitespace and =
=
- a =
char.*?
- any 0+ chars other than line break chars, as few as possible(?=\s*#[^\s=]+=|$)
- up to (and excluding) 0+ whitespaces, #
, 1+ chars other than whitespace and =
and then =
or up the end of string.Upvotes: 1