Reputation: 923
I am trying to write custom log parser. log file is as follow:
09:57:25Host_Name Trace 00000
<MessageLogTraceRecord Time="2017-04-13T09:57:25.1393344+00:00" abcd
some string ---
SQ->
09:57:25Host_Name Trace 00000
<MessageLogTraceRecord Time="2017-04-13T09:57:25.1393344+00:00" abcd
some string ---
D-->
SQ->
09:57:28Host_Name Trace 00000
<MessageLogTraceRecord Time="2017-04-13T09:57:28.1393344+00:00" abcd
some string ---
D-->
SQ->
09:58:28Host_Name Trace 00000
<MessageLogTraceRecord Time="2017-04-13T09:58:28.1393344+00:00" abcd
some string ---
D-->
SQ->
The goal is to have json output in following format
[{'host_name': host_name, 'time': '2017-04-13T09:58:28.1393344+00:00', 'msg
: '<MessageLogTraceRecord Time="2017-04-13T09:57:25.1393344+00:00" abcd
some string ---
D-->
SQ->'}, {'host_name': host_name, 'time': '2017-04-13T09:58:28.1393344+00:00', 'msg
: '<MessageLogTraceRecord Time="2017-04-13T09:57:25.1393344+00:00" abcd
some string ---
D-->
SQ->'}]
problem I am facing is to get value between two time object and time.
following i tried:
jsonlist = []
jsonout = {}
li = [i.strip().split() for i in open(filepath).readlines()]
start_index, end_index=0,0
msg = ''
with open(filepath, 'r') as f:
for index, line in enumerate(f):
if start_index !=0 and end_index!=0:
result = list(itertools.chain.from_iterable(li[start_index: end_index]))
msg = ''.join(str(x) for x in result)
jsonoutput['message'] = msg.replace('"', '\\').strip()
jsonoutput['time'] = msg.
start_index, end_index = 0,0
try:
if start_index !=0:
if parser(line.split()[0].split('Host_Name')[0]):
end_index = index
else:
start_index = index
I am not able to get time value and correct msg. Any suggestion in doing it any better way will be very helpful
Upvotes: 1
Views: 91
Reputation: 5074
I wrote my own code:
import json
import re
def logs(file_path):
"""
:param file_path: path to your log file, example: /home/user/my_file.log
"""
msg = ''
final = []
our_log = open(file_path, 'r')
log_lines = our_log.readlines()
for line in log_lines:
time = re.search("^[\d]+:[\d]+:[\d]+", line)
if time:
if msg:
final[-1].update(msg=msg)
msg = ''
time = time.group(0)
host_name = re.search(time + '(.*)' + ' Trace', line).group(1)
# If you need the time like "09:57:25", instead of "'2017-04-13T09:57:25.1393344+00:00"
# then uncomment the line below
# info = dict(time=time, host_name=host_name)
# and comment the one below
info = dict(host_name=host_name)
final.append(info)
else:
# and also comment the next 3 lines
if 'Time="' in line:
time = re.search('Time="' + '(.*)' + '"', line).group(1)
final[-1].update(time=time)
msg += line.strip()
final[-1].update(msg=msg) # adds message for the last time-section
json_out = json.dumps(final)
Based on the data you provided, the var final
looks like:
[{'msg': '<MessageLogTraceRecord Time="2017-04-13T09:57:25.1393344+00:00" abcdsome string ---SQ->', 'time': '2017-04-13T09:57:25.1393344+00:00', 'host_name': 'Host_Name'}, {'msg': '<MessageLogTraceRecord Time="2017-04-13T09:57:25.1393344+00:00" abcdsome string ---D-->SQ->', 'time': '2017-04-13T09:57:25.1393344+00:00', 'host_name': 'Host_Name'}, {'msg': '<MessageLogTraceRecord Time="2017-04-13T09:57:28.1393344+00:00" abcdsome string ---D-->SQ->', 'time': '2017-04-13T09:57:28.1393344+00:00', 'host_name': 'Host_Name '}, {'msg': '<MessageLogTraceRecord Time="2017-04-13T09:58:28.1393344+00:00" abcdsome string ---D-->SQ->', 'time': '2017-04-13T09:58:28.1393344+00:00', 'host_name': 'Host_Name '}]
Upvotes: 2