Reputation:
I have a debug log file as you can see below:
Sample file:
DEBUG: Fri Dec 7 06:49:14 2018:16920 extra text
DEBUG: Fri Dec 7 06:49:14 2018:16920: start <ID>
DEBUG: Fri Dec 7 06:49:14 2018:16920: Final output is "output
output output
output"
DEBUG: extra lines
I want to fetch only the IDs and the final output as shown below.
Expected output:
<ID> "output
output output
output"
I would like to do this in either python or bash. Any help would be appreciated. Thanks
Current code works for "final output" only. but I want to fetch IDs as well and there should be a way to distinguish (seperator) for each ID and their output.
stream=open("debuglog.txt","r")
lines=stream.readlines()
flag = 0
for i in lines:
if "DEBUG:" in i:
flag = 0
if "final output is" in i:
flag = 1
if flag:
print(i)
Upvotes: 1
Views: 547
Reputation: 22012
With python, how about:
#!/usr/bin/python
import re
text = open("logfile", "r").read()
regex = r'start (.+?)$.*?Final output is (.+?)(?:(?=\nDEBUG)|\Z)'
for m in re.finditer(regex, text, re.MULTILINE|re.DOTALL):
for i in m.groups():
print(i.replace('\n', ' '))
Input logfile:
DEBUG: Fri Dec 7 06:49:14 2018:16920 extra text
DEBUG: Fri Dec 7 06:49:14 2018:16920: start <ID>
DEBUG: Fri Dec 7 06:49:14 2018:16920: Final output is "output
output output
output"
DEBUG: extra lines
DEBUG: Fri Dec 7 06:49:14 2018:16920 extra text
DEBUG: Fri Dec 7 06:49:14 2018:16920: start <ID2>
DEBUG: Fri Dec 7 06:49:14 2018:16920: Final output is "output2
output+ output/
output2"
And the output:
<ID>
"output output output output"
<ID2>
"output2 output+ output/ output2"
start
and before the newline and stores the string into the 1st group
.Final output is
and before DEBUG
or the end of the string and stores the string to the 2nd group
. Newlines can be included in the string due to re.DOTALL
option.EDIT
The updated version below handles multiple "final output" for a single ID and displays only the last output for each ID:
#!/usr/bin/python
import re
text = open("logfile", "r").read()
regex = r'start (.+?)$(.+?)(?:(?=DEBUG[^\n]+?start)|\Z)+'
regex2 = r'Final output is (.+?)(?:(?=\nDEBUG)|\Z)'
for m in re.finditer(regex, text, re.MULTILINE|re.DOTALL):
print m.group(1)
m2 = re.finditer(regex2, m.group(2), re.MULTILINE|re.DOTALL)
print list(m2).pop().group(1).replace('\n', ' ')
input logfile:
DEBUG: Fri Dec 7 06:49:14 2018:16920 extra text
DEBUG: Fri Dec 7 06:49:14 2018:16920: start <ID1>
DEBUG: Fri Dec 7 06:49:14 2018:16920: Final output is "output
output output
output"
DEBUG: extra lines
DEBUG: Fri Dec 7 06:49:14 2018:16920: Final output is "this
is the last output
for <ID1>"
DEBUG: extra lines
DEBUG: Fri Dec 7 06:49:14 2018:16920 extra text
DEBUG: Fri Dec 7 06:49:14 2018:16920: start <ID2>
DEBUG: Fri Dec 7 06:49:14 2018:16920: Final output is "output2
output+ output/
output2"
and output:
<ID1>
"this is the last output for <ID1>"
<ID2>
"output2 output+ output/ output2"
I've divided the extraction of substrings into two steps:
regex
.regex2
.Then pick the last "final output" and display.
EDIT
The version below suppresses the message(s) which contains some keyword:
#!/usr/bin/python
import re
text = open("logfile", "r").read()
exclude = 'xyz' # keyword to suppress the output
regex = r'start (.+?)$(.+?)(?:(?=DEBUG[^\n]+?start)|\Z)+'
regex2 = r'Final output is (.+?)(?:(?=\nDEBUG)|\Z)'
#regex = r'start (.+?)$.*?Final output is (.+?)(?=\nDEBUG)'
#for m in re.finditer(regex, text, flags=(re.MULTILINE|re.DOTALL)):
for m in re.finditer(regex, text, re.MULTILINE|re.DOTALL):
print m.group(1)
m2 = re.finditer(regex2, m.group(2), re.MULTILINE|re.DOTALL)
message = list(m2).pop().group(1).replace('\n', ' ')
if message.count(exclude):
print 'error:' + exclude
else:
print message
Sample logfile:
DEBUG: Fri Dec 7 06:49:14 2018:16920 extra text
DEBUG: Fri Dec 7 06:49:14 2018:16920: start <ID1>
DEBUG: Fri Dec 7 06:49:14 2018:16920: Final output is "output
output output
output"
DEBUG: extra lines
DEBUG: Fri Dec 7 06:49:14 2018:16920: Final output is "this
is the last output
for ID1"
DEBUG: extra lines
DEBUG: Fri Dec 7 06:49:14 2018:16920 extra text
DEBUG: Fri Dec 7 06:49:14 2018:16920: start <ID2>
DEBUG: Fri Dec 7 06:49:14 2018:16920: Final output is "output2
output+ output/
output2"
DEBUG: extra lines
DEBUG: Fri Dec 7 06:49:14 2018:16920 extra text
DEBUG: Fri Dec 7 06:49:14 2018:16920: start <ID3>
DEBUG: Fri Dec 7 06:49:14 2018:16920: Final output is "this message
contains the word xyz"
DEBUG: extra lines
The output:
<ID1>
"this is the last output for ID1"
<ID2>
"output2 output+ output/ output2"
<ID3>
error:xyz
Upvotes: 0
Reputation: 8711
With Perl, you can do it with one-liner, if the file could fit into memory..
/tmp> cat debug.log
DEBUG: Fri Dec 7 06:49:14 2018:16920 extra text
DEBUG: Fri Dec 7 06:49:14 2018:16920: start <ID1>
DEBUG: Fri Dec 7 06:49:14 2018:16920: Final output is "output
output output
output"
DEBUG: extra lines
DEBUG: Fri Dec 7 06:49:14 2018:16921 extra text
DEBUG: Fri Dec 7 06:49:14 2018:16921: start <ID2>
DEBUG: Fri Dec 7 06:49:14 2018:16921: Final output is "output output output output"
DEBUG: extra lines
/tmpl>
/tmp> perl -0777 -ne ' while(/^DEBUG(.+?)start (\S+).*?DEBUG.+?Final output is \"(.+?)\"/smg) { print "$2 $3\n" } ' debug.log
<ID1> output
output output
output
<ID2> output output output output
/tmp>
Upvotes: 0
Reputation: 4486
Sample log file:
DEBUG: Fri Dec 7 06:49:14 2018:16920 extra text
DEBUG: Fri Dec 7 06:49:14 2018:16920: start 12324
DEBUG: Fri Dec 7 06:49:14 2018:16920: Final output is "output output output output"
DEBUG: extra lines
Please find the code. Also, I am assuming you have only one instance of each ID and output
import sys, re
stream=open("log","r")
lines=stream.readlines()
flag_ID = 0
flag_output = 0
flag_print = 1
for i in lines:
ID = re.match("DEBUG: [\w :]* start (\d+)", i)
output = re.match("DEBUG: [\w :]* Final output is \"([\w ]*)\"", i)
if ID:
flag_ID = 1
value_ID = ID.group(1)
if output:
flag_output = 1
value_output = output.group(1)
if flag_output == 1 and flag_ID == 1 and flag_print == 1:
print "{0} {1}".format(value_ID, value_output)
flag_print = 0
output
12324 output output output output
Please tick mark and accept if this solves your problem ;)
Upvotes: 2