Reputation: 678
I have this log text file:
omer| (stmt : 0) | adminT| Connection id - 0
omer| (stmt : 0) | adminT| Start Time - 2018-11-06 16:52:01
omer| (stmt : 0) | adminT| Statement create or replace table amit (x date);
omer| (stmt : 0)| adminT| Connection id - 0 - Executing - create or replace table amit (x date);
omer| (stmt : 0) | adminT| Connection id - 0
omer| (stmt : 0) | adminT| End Time - 2018-11-06 16:52:01
omer| (stmt : 0) | adminT| SQL - create or replace table amit (x date);
omer| (stmt : 0) | adminT| Success
admin| (stmt : 1) | adminT| Connection id - 0
admin| (stmt : 1) | adminT| Start Time - 2018-11-06 16:52:14
admin| (stmt : 1) | adminT| Statement create or replace table amit (x int, y int);
admin| (stmt : 1)| adminT| Connection id - 0 - Executing - create or replace table amit (x int, y int);
admin| (stmt : 1) | adminT| Connection id - 0
admin| (stmt : 1) | adminT| End Time - 2018-11-06 16:52:15
admin| (stmt : 2) | adminT| Connection id - 0
admin| (stmt : 2) | adminT| Start Time - 2018-11-06 16:52:19
admin| (stmt : 2) | adminT| Statement create table amit (x int, y int);
admin| (stmt : 2) | adminT| Connection id - 0
admin| (stmt : 2) | adminT| End Time - 2018-11-06 16:52:22
admin| (stmt : 2) | adminT| SQL - Can't create table 'public.amit' - a table with the same name already exists
admin| (stmt : 2) | adminT| Failed
now I want to know the delta between start date to end date (as can be seen in the end of the line), next I want to know if the statement is successful or not (marked by Failed or Success). and then I want to calculate the delta from start time and end time, so this is the code I implemented:
def parse_log_file(log_file):
print(len(""))
my_path = os.path.abspath(os.path.dirname(__file__))
path = os.path.join(my_path, log_file)
max_delta = 0
with open(path, 'r') as f:
lines = f.readlines()[1:]
for line in lines:
elements = line.split('|')
# strip the lines of surrounding spaces
elements = [t.strip() for t in elements]
statement_id = elements[6]
if "Start Time" in elements[8] and statement_id in elements[6]:
start_date = get_date_parsed(elements[8])
if "End Time" in elements[8] and statement_id in elements[6]:
end_date = get_date_parsed(elements[8])
date_time_start_obj = datetime.datetime.strptime(start_date, '%Y-%m-%d %H:%M:%S')
date_time_end_obj = datetime.datetime.strptime(end_date, '%Y-%m-%d %H:%M:%S')
delta = date_time_end_obj - date_time_start_obj
if delta.seconds > max_delta:
max_delta = delta
print(max_delta)
print("hello")
def get_date_parsed(date_str):
res = date_str.split(' ')[3] + ' ' + date_str.split(' ')[4]
return res
Now I want to know if there is a way to know if the next lines contain 'Success' so the date calculation would be valid.
Upvotes: 0
Views: 126
Reputation: 3503
Updated codes to match your full log format following:
2018-11-06 16:54:43.350| on thread[140447603222272 c23]| IP[192.168.0.214:5000]| master| 192.168.0.244| sqream| (stmt : 30) | sqream| Connection id - 23
Code:
import re
from datetime import datetime
class Event:
__slots__ = ('start', 'statement', 'end', 'success', 'stmt')
# This limits what attribute class can have. Originally class use Dictionary to save attributes,
# but using __slots__ uses Tuple instead, and saves memory if there's lots of class instances.
"""
Class isn't necessary, and longer than other solutions.
But class gives you more control when expansion / changes are needed.
"""
def __init__(self, start, statement, end, success):
self.start = start.split('- ')[-1]
self.statement = statement.split('Statement ')[-1].strip(';')
self.end = end.split('- ')[-1]
self.success = success.split()[-1]
self.stmt = re.search(r"(?<=stmt : )[^)]*", statement).group(0)
def __str__(self):
"""
When str() or print() is called on this class instances - this will be output.
"""
return f"Event starting at {self.start}, Took {self.delta_time} sec."
def __repr__(self):
"""
repr should returns string with data that is enough to recreate class instance.
"""
output = [f"stmt : {self.stmt}",
f"Took : {self.delta_time} sec",
f"Statement: {self.statement}",
f"Status : {self.success}",
f"Start : {self.start}",
f"End : {self.end}"]
return '\n'.join(output)
@property
def delta_time(self):
"""
Converting string to datetime object to perform delta time calculation.
"""
date_format = "%Y-%m-%d %H:%M:%S"
start = datetime.strptime(self.start, date_format)
end = datetime.strptime(self.end, date_format)
return (end - start).total_seconds()
def generate_events(file):
def line_yield():
"""
Generates line inside file without need to load whole file in memory.
As generator is one-shot, using this to simplify pause / continue of
line iteration.
"""
for line_ in file:
yield line_.strip("\n")
find_list = ("Start Time", "Statement", "End Time")
generator = line_yield()
while True:
group = []
for target in find_list:
for line in generator: # our generator keeps state after the loop.
if target in line: # 'in' finds faster than regex.
group.append(line)
break
for line in generator: # now find either statement was Successful or not.
if "Success" in line or "Failed" in line:
group.append(line)
break
try:
yield Event(*group)
except TypeError:
return
def find_slowest(log_file):
formed = list(generate_events(log_file))
sorted_output = sorted(formed, key=lambda event_: event_.delta_time)
print("Recorded Events:")
for output in sorted_output:
print(output)
late_runner = sorted_output[-1]
print('\n< Slowest >')
print(repr(late_runner))
with open("logfile.log", 'r') as log:
find_slowest(log)
Results with full log file:
Recorded Events:
Event starting at 2018-11-06 16:52:01, Took 0.0 sec.
Event starting at 2018-11-06 16:52:19, Took 0.0 sec.
Event starting at 2018-11-06 16:52:27, Took 0.0 sec.
Event starting at 2018-11-06 16:52:28, Took 0.0 sec.
Event starting at 2018-11-06 16:52:30, Took 0.0 sec.
Event starting at 2018-11-06 16:52:33, Took 0.0 sec.
Event starting at 2018-11-06 16:52:38, Took 0.0 sec.
Event starting at 2018-11-06 16:52:54, Took 0.0 sec.
Event starting at 2018-11-06 16:53:04, Took 0.0 sec.
Event starting at 2018-11-06 16:53:05, Took 0.0 sec.
Event starting at 2018-11-06 16:53:18, Took 0.0 sec.
Event starting at 2018-11-06 16:53:32, Took 0.0 sec.
Event starting at 2018-11-06 16:53:36, Took 0.0 sec.
Event starting at 2018-11-06 16:53:51, Took 0.0 sec.
Event starting at 2018-11-06 16:53:55, Took 0.0 sec.
Event starting at 2018-11-06 16:53:56, Took 0.0 sec.
Event starting at 2018-11-06 16:54:03, Took 0.0 sec.
Event starting at 2018-11-06 16:54:07, Took 0.0 sec.
Event starting at 2018-11-06 16:54:27, Took 0.0 sec.
Event starting at 2018-11-06 16:54:36, Took 0.0 sec.
Event starting at 2018-11-06 16:52:14, Took 1.0 sec.
Event starting at 2018-11-06 16:53:25, Took 1.0 sec.
Event starting at 2018-11-06 16:53:40, Took 1.0 sec.
Event starting at 2018-11-06 16:54:21, Took 1.0 sec.
< Slowest >
stmt : 27
Took : 1.0 sec
Statement: drop table tati
Status : Success
Start : 2018-11-06 16:54:21
End : 2018-11-06 16:54:22
Process finished with exit code 0
Upvotes: 1
Reputation: 12503
Here's a solution that is based on a set of regular expressions - one for each pattern you're looking for. In the end, I'm storing all the data in a pandas dataframe for analysis.
statement_id_re = re.compile(r"\(stmt : (\d+)\)")
end_re = re.compile(r"End Time - (\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})$")
start_re = re.compile(r"Start Time - (\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})$")
success_re = re.compile(r"\|\s+Success$")
all_statements = []
current_statement = {}
for line in file:
statement_id = statement_id_re.search(line).groups()[0]
start = start_re.search(line)
end = end_re.search(line)
success = success_re.search(line)
if start:
current_statement = {
"id": statement_id,
"start": start.groups()[0]
}
elif success:
current_statement["status"] = "success"
elif end:
current_statement["end"] = end.groups()[0]
all_statements.append(current_statement)
else:
pass
df = pd.DataFrame(all_statements)
df.start = pd.to_datetime(df.start)
df.end = pd.to_datetime(df.end)
df["duration"] = df.end - df.start
slowest = df.loc[df.duration.idxmin()]
print(f"The slowest statement is {slowest['id']} and it took {slowest['duration']}")
The result for your data is:
The slowest statement is 0 and it took 0 days 00:00:00
Upvotes: 1