cuda_hpc80
cuda_hpc80

Reputation: 607

Parse text files using Python

I have a log file that I would like to parse and plot using matplotlib. After skipping the first 6 lines, I have data of interest. e.g. my log file looks like this:

# 2014-05-09 17:51:50,473 - root - INFO - Epoch = 1, batch = 216, Classif Err = 52.926, lg(p) -1.0350
# 2014-05-09 17:51:53,749 - root - INFO - Test set error = 37.2317

I want to make a plot of the Classif Err vs Test set error for each Epoch.

My first attempt at this:

import numpy
from numpy import *
from pylab import *

f1 = open('log.txt', 'r')
FILE = f1.readlines()
f1.close()

for line in FILE:
    line = line.strip()
    if ('Epoch' in line):
        epoch += line.split('Epoch = ')
    elif('Test set error' in line):
        test_err += line.split('Test set error = ')

I see this error:

Traceback (most recent call last):
  File "logfileparse.py", line 18, in <module>
    epoch += line.split('Epoch = ')
NameError: name 'epoch' is not defined

Upvotes: 1

Views: 392

Answers (4)

Padraic Cunningham
Padraic Cunningham

Reputation: 180391

This will find Epoch and its value, appending it to a list.

epoch=[] # define epoch
with open('log.txt', 'r') as f: #  use with to open files as it automatically closes the file
    for line in f:
        if "Epoch" in line:
            epoch.append(line[line.find("Epoch ="):].split(',')[0])
        elif('Test set error' in line):
            test_error.append(line[line.find("Test set error ="):].split(',')[0]) 
print epoch
['Epoch = 1']
print test_error
['Test set error = 37.2317']

Uses index of "Epoch" to slice the string, split on ',' and append first element "Epoch = ..." to the epoch list.

Upvotes: 1

Jordi Pallares
Jordi Pallares

Reputation: 109

You do not initialize the variable epoch. is important that you do before:

epoch += line.split('Epoch = ')

Upvotes: 0

Burhan Khalid
Burhan Khalid

Reputation: 174614

I guess you need to get a set of epoch and the test set errors together to plot them. Assuming the error line is always after the line with 'epoch', try this:

data_points = []
ep = 'Epoch = (\d+), batch = \d+, Classif Err = (\d+\.?\d+)'

with open('file.txt') as f:
    for line in f:
       epoch = re.findall(ep, line)
       if epoch:
           error_line = next(f) # grab the next line, which is the error line
           error_value = error_line[error_line.rfind('=')+1:]
           data_points.append(map(float,epoch[0]+(error_value,)))

Now data_points will be a list of lists, the first value is the epoch, the second the classif err value, and the third the error value.

The regular expression will return a list with a tuple:

>>> re.findall(ep, i)
[('1', '52.926')]

Here i is your first line

To grab the error code, find the last = and then the error code is the remaining characters:

>>> i2 = '# 2014-05-09 17:51:53,749 - root - INFO - Test set error = 37.2317'
>>> i2[i2.rfind('=')+1:]
' 37.2317'

I used map(float,epoch[0]+(error_value,)) to convert the values from strings to floats:

>>> map(float, re.findall(ep, i)[0]+(i2[i2.rfind('=')+1:],))
[1.0, 52.926, 37.2317]

Upvotes: 1

Shahin
Shahin

Reputation: 1475

As I tried your code more, I saw there is another problem after you didn't defined epoch variable. And by that I mean you are trying to concatenate a list object to a string object as your code shows to us! I tried to validate this code and got something like this:

epoch = []
for line in f1.readlines():
    line_list = line.split(' ')
    if 'Epoch' in line_list:
        epoch_index = line_list.index('Epoch')
        message = ' '.join(line_list[epoch_index:])
        epoch.append(message)
    elif 'Test set error' in line_list:
        error_index = line_list.index('Test set error')
        message = ' '.join(line_list[error_index:])
        epoch.append(message)

Upvotes: 1

Related Questions