Parsing log files to find related events in python

Question

I have a log file that I need to parse to find whether a certain event is followed by another related event or not. Essentially whether the first event is alone or has a associated pair event. For example the data could is of the form:

Timestamp         Event        Property1        Property2      Property3
1445210282416     E1             A               1               Type1   *
1445210282434     F1             D               3               Type10      
1445210282490     E1             C               5               Type2
1445210282539     E2             A               1               Type1   *
1445210282943     F1             D               1               Type15 
1445210285452     E2             C               4               Type3

This is a simplified example but is essentially the same as the data file. We are trying to find if an event E1 has a corresponding event E2 for which Property1, Property2 and Property3 but be equal like in the two events with * shown. The second E1 event (row 3) doesn't have a corresponding E2 event. I also need to keep count of such events with no pairs corresponding to Property3 as key for later usage.

The files can be quite large (around 1 GB) and should avoid having the whole file in memory at the same time. So, I figured I could use a generator.

A initial attempt from is:

with open(filename, 'rb') as f:
    finding_pair = 0      # indicator to help determine what to do in a line of the file
    e1 = {}               # store the E1 row whose pair we want to find
    without_pair = {}     # store count of E1 events with no pair

    line = csv.DictReader((line for line in f), delimiter = ' ')

    for l in line:
        if l['Event'] = E1 and finding_pair = 0:  # find pair for this  
           // Go through file after this line to find E2 event.
           e1 = l
           finding_pair = 1
        elif (l['Event'] = E1 or l['Event'] = F1) and finding_pair = 1: # skip this and keep finding pair   
            continue
        elif l['Event'] = E2 and finding_pair = 1: # see if this is a pair
            if l['Property1'] == e1['Property1'] and l['Property2'] == e1['Property2'] and l['Property3'] == e1['Property3']:
                # pair found
                finding_pair = 0
                // Go to next E1 line ??
            else:
               # pair not found
               without_pair['Property3'] += 1
               // Go to next E1 line ??

So, my questions are:

How do I move the iterator back to E1 in row 3 after already moving to E2 in row 4 to find my pair?
E1 and E2 should occur quite close in time (within 1 minute). How do I avoid restricting checking for the pair within a 1 min. window from E1?
Is there a better way of approaching this?

Parsing log files to find related events in python

Answers (1)

Related Questions