STD
STD

Reputation: 27

How to split a log file into several csv files with python

I'm pretty new to python and coding in general, so sorry in advance for any dumb questions. My program needs to split an existing log file into several *.csv files (run1,.csv, run2.csv, ...) based on the keyword 'MYLOG'. If the keyword appears it should start copying the two desired columns into the new file till the keyword appears again. When finished there need to be as many csv files as there are keywords.


53.2436     EXP     MYLOG: START RUN specs/run03_block_order.csv
53.2589     EXP     TextStim: autoDraw = None
53.2589     EXP     TextStim: autoDraw = None
55.2257     DATA    Keypress: t
57.2412     DATA    Keypress: t
59.2406     DATA    Keypress: t
61.2400     DATA    Keypress: t
63.2393     DATA    Keypress: t
...
89.2314     EXP     MYLOG: START BLOCK scene [specs/run03_block01.csv]
89.2336     EXP     Imported specs/run03_block01.csv as conditions
89.2339     EXP     Created sequence: sequential, trialTypes=9
...

[EDIT]: The output per file (run*.csv) should look like this:

onset       type
53.2436     EXP     
53.2589     EXP     
53.2589     EXP     
55.2257     DATA    
57.2412     DATA    
59.2406     DATA    
61.2400     DATA    
...

The program creates as much run*.csv as needed, but i can't store the desired columns in my new files. When finished, all I get are empty csv files. If I shift the counter variable to == 1 it creates just one big file with the desired columns.

Thanks again!

import csv

QUERY = 'MYLOG'

with open('localizer.log', 'rt') as log_input:
i = 0

for line in log_input:

    if QUERY in line:
        i = i + 1

        with open('run' + str(i) + '.csv', 'w') as output:
            reader = csv.reader(log_input, delimiter = ' ')
            writer = csv.writer(output)
            content_column_A = [0]
            content_column_B = [1]

            for row in reader:
                content_A = list(row[j] for j in content_column_A)
                content_B = list(row[k] for k in content_column_B)
                writer.writerow(content_A)
                writer.writerow(content_B)

Upvotes: 1

Views: 1589

Answers (2)

Waylon Walker
Waylon Walker

Reputation: 563

You can use pandas to simplify this problem.

Import pandas and read in log file.

import pandas as pd

df = pd.read_fwf('localizer2.log', header=None)
df.columns = ['onset', 'type', 'event']
df.set_index('onset', inplace=True)

Set Flag where third column == 'MYLOG'

df['flag'] = 0
df.loc[df.event.str[:5] == 'MYLOG', 'flag'] = 1
df.flag = df['flag'].cumsum()

Save each run as a separate run*.csv file

for i in range(1, df.flag.max()+1):
    df.loc[df.flag == i, 'event'].to_csv('run{0}.csv'.format(i))

EDIT: Looks like your format is different than I originally assumed. Changed to use pd.read_fwf. my localizer.log file was a copy and paste of your original data, hope this works for you. I assumed by the original post that it did not have headers. If it does have headers then remove header=None and df.columns = ['onset', 'type', 'event'].

Upvotes: 0

Geekfish
Geekfish

Reputation: 2314

Looking at the code there's a few things that are possibly wrong:

  1. the csv reader should take a file handler, not a single line.
  2. the reader delimiter should not be a single space character as it looks like the actual delimiter in your logs is a variable number of multiple space characters.
  3. the looping logic seems to be a bit off, confusing files/lines/rows a bit.

You may be looking at something like the code below (pending clarification in the question):

import csv
NEW_LOG_DELIMITER = 'MYLOG'

def write_buffer(_index, buffer):
    """
    This function takes an index and a buffer.
    The buffer is just an iterable of iterables (ex a list of lists)
    Each buffer item is a row of values.
    """
    filename = 'run{}.csv'.format(_index)
    with open(filename, 'w') as output:
        writer = csv.writer(output)
        writer.writerow(['onset', 'type'])  # adding the heading
        writer.writerows(buffer)

current_buffer = []
_index = 1

with open('localizer.log', 'rt') as log_input:
    for line in log_input:
        # will deal ok with multi-space as long as
        # you don't care about the last column
        fields = line.split()[:2]
        if not NEW_LOG_DELIMITER in line or not current_buffer:
            # If it's the first line (the current_buffer is empty)
            # or the line does NOT contain "MYLOG" then
            # collect it until it's time to write it to file.
            current_buffer.append(fields)
        else:
            write_buffer(_index, current_buffer)
            _index += 1
            current_buffer = [fields]  # EDIT: fixed bug, new buffer should not be empty
    if current_buffer:
        # We are now out of the loop,
        # if there's an unwritten buffer then write it to file.
        write_buffer(_index, current_buffer)

Upvotes: 1

Related Questions