pythonhmmm
pythonhmmm

Reputation: 923

parse and fetch values between two time object in log file

I am trying to write custom log parser. log file is as follow:

     09:57:25Host_Name  Trace                      00000                                                  
      <MessageLogTraceRecord Time="2017-04-13T09:57:25.1393344+00:00"  abcd 
      some string ---
        SQ-> 
    09:57:25Host_Name  Trace                      00000                                                  
     <MessageLogTraceRecord Time="2017-04-13T09:57:25.1393344+00:00" abcd 
      some string ---
        D--> 
        SQ-> 
    09:57:28Host_Name   Trace                      00000                                                  
     <MessageLogTraceRecord Time="2017-04-13T09:57:28.1393344+00:00" abcd 
      some string ---
        D--> 
        SQ-> 
    09:58:28Host_Name   Trace                      00000                                                  
     <MessageLogTraceRecord Time="2017-04-13T09:58:28.1393344+00:00" abcd 
      some string ---
        D--> 
        SQ-> 

The goal is to have json output in following format 
[{'host_name': host_name, 'time': '2017-04-13T09:58:28.1393344+00:00', 'msg
: '<MessageLogTraceRecord Time="2017-04-13T09:57:25.1393344+00:00" abcd 
  some string ---
    D--> 
    SQ->'}, {'host_name': host_name, 'time': '2017-04-13T09:58:28.1393344+00:00', 'msg
: '<MessageLogTraceRecord Time="2017-04-13T09:57:25.1393344+00:00" abcd 
  some string ---
    D--> 
    SQ->'}]

problem I am facing is to get value between two time object and time.

following i tried:

jsonlist = []
jsonout = {}
li = [i.strip().split() for i in open(filepath).readlines()]
start_index, end_index=0,0
msg = ''
with open(filepath, 'r') as f:
  for index, line in enumerate(f):
    if start_index !=0 and end_index!=0:
      result =  list(itertools.chain.from_iterable(li[start_index: end_index]))
            msg =  ''.join(str(x) for x in result)
            jsonoutput['message'] =  msg.replace('"', '\\').strip()
            jsonoutput['time'] = msg.
            start_index, end_index = 0,0
    try:
      if start_index !=0:
        if parser(line.split()[0].split('Host_Name')[0]):
          end_index = index
        else:
          start_index = index 

I am not able to get time value and correct msg. Any suggestion in doing it any better way will be very helpful

Upvotes: 1

Views: 91

Answers (1)

TitanFighter
TitanFighter

Reputation: 5074

I wrote my own code:

import json
import re


def logs(file_path):
    """
    :param file_path: path to your log file, example: /home/user/my_file.log 
    """
    msg = ''
    final = []

    our_log = open(file_path, 'r')
    log_lines = our_log.readlines()

    for line in log_lines:
        time = re.search("^[\d]+:[\d]+:[\d]+", line)

        if time:
            if msg:
                final[-1].update(msg=msg)
                msg = ''

            time = time.group(0)
            host_name = re.search(time + '(.*)' + '  Trace', line).group(1)

            # If you need the time like "09:57:25", instead of "'2017-04-13T09:57:25.1393344+00:00"
            # then uncomment the line below
            # info = dict(time=time, host_name=host_name)

            # and comment the one below
            info = dict(host_name=host_name)

            final.append(info)

        else:
            # and also comment the next 3 lines
            if 'Time="' in line:
                time = re.search('Time="' + '(.*)' + '"', line).group(1)
                final[-1].update(time=time)
            msg += line.strip()

    final[-1].update(msg=msg)  # adds message for the last time-section

    json_out = json.dumps(final)

Based on the data you provided, the var final looks like:

[{'msg': '<MessageLogTraceRecord Time="2017-04-13T09:57:25.1393344+00:00"  abcdsome string ---SQ->', 'time': '2017-04-13T09:57:25.1393344+00:00', 'host_name': 'Host_Name'}, {'msg': '<MessageLogTraceRecord Time="2017-04-13T09:57:25.1393344+00:00" abcdsome string ---D-->SQ->', 'time': '2017-04-13T09:57:25.1393344+00:00', 'host_name': 'Host_Name'}, {'msg': '<MessageLogTraceRecord Time="2017-04-13T09:57:28.1393344+00:00" abcdsome string ---D-->SQ->', 'time': '2017-04-13T09:57:28.1393344+00:00', 'host_name': 'Host_Name '}, {'msg': '<MessageLogTraceRecord Time="2017-04-13T09:58:28.1393344+00:00" abcdsome string ---D-->SQ->', 'time': '2017-04-13T09:58:28.1393344+00:00', 'host_name': 'Host_Name '}]

Upvotes: 2

Related Questions