Reputation: 1475
Deal all
I faced some not trivial problem for me to parse log.
I need to go through a file and check if the line matches the patter : if YES then get ClientID specified in this line.
The line looks like :
17.02.09 10:42:31.242 TRACE [1245] GDS: someText(SomeText).ClientID: '' -> '99071901'
So I need to get 99071901.
I tried to construct regexp search pattern, but it is not complete..stuck at 'TRACE':
regex = '(^[(\d\.)]+) ([(\d\:)]+) ([\bTRACE\b]+) ([(\d)]+) ([\bGDS\b:)]+) ([\ClientID\b])'
Script code is :
log=open('t.log','r')
for i in log:
key=re.search(regex,i)
print(key.group()) #print string matching
for g in key:
client_id=re.seach(????,g) # find ClientIt
log.close()
Appreciate if you give me a hint how to solve this challenge.
Thank you.
Upvotes: 2
Views: 348
Reputation: 43517
You don't need to be too specific. You can just capture the sections and parse them individually.
Lets start with just your one line for example:
line = "17.02.09 10:42:31.242 TRACE [1245] GDS: someText(SomeText).ClientID: '' -> '99071901'"
And then lets add our first regex that gets all the sections:
import re
line_regex = re.compile(r'(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+):\s+(.+)')
# now extract each section
date, time, level, thread, module, message = line_regex.match(line).groups()
Now, if we look at the different sections they will have all the information we need to make more decisions, or further parse them. Now lets get the client ID when the right kind of message shows up.
client_id_regex = re.compile(r".*ClientID: '' -> '(\d+)'")
if 'ClientID' in message:
client_id = client_id_regex.match(message).group(1)
And now we have the client_id
.
Just work that logic into your loop and you are all set.
line_regex = re.compile(r'(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+):\s+(.+)')
client_id_regex = re.compile(r".*ClientID: '' -> '(\d+)'")
with open('t.log','r') as f: # use with context manager to auto close the file
for line in f: # lets iterate over the lines
sections = line_regex.match(line) # make a match object for sections
if not sections:
continue # probably you want to handle this case
date, time, level, thread, module, message = sections.groups()
if 'ClientID' in message: # should we even look here for a client id?
client_id = client_id_regex.match(message).group(1)
# now do what you wanted to do
Upvotes: 2
Reputation: 627607
You may use capturing parentheses around those parts in the pattern that you are interested in, and then access those parts using group(n)
where n
is the corresponding group ID:
import re
s = "17.02.09 10:42:31.242 TRACE [1245] GDS: someText(SomeText).ClientID: '' -> '99071901'"
regex = r"^([\d.]+)\s+([\d.:]+)\s+(TRACE)\s+\[(\d+)] GDS:.*?ClientID:\s*''\s*->\s*'(\d+)'$"
m = re.search(regex, s)
if m:
print(m.group(1))
print(m.group(2))
print(m.group(3))
print(m.group(4))
print(m.group(5))
See the Python online demo
The pattern is
^([\d.]+)\s+([\d.:]+)\s+(TRACE)\s+\[(\d+)] GDS:.*?ClientID:\s*''\s*->\s*'(\d+)'$
See its online demo here.
Note that you have messed the character classes with groups: (...)
groups subpatterns and captures them while [...]
defines character classes that match single characters.
Upvotes: 1