mauve
mauve

Reputation: 2763

skip blank lines with csv.reader

I need to read in a .txt file, find the row where labels are, return a list (or other iterable) of those labels plus the index of the next line. In this particular program, I use it the first time to open the file and return the labels (which are consistent) and the index of the next line for the purpose of identifying what to open with np.genfromtxt. The subsequent uses are just to determine the index only.

Sometimes, the technician will put an extra carriage return in when entering test parameters and it results in an extra blank line. When that happens, I get an empty set instead of labels. In TFM it seems that csv.reader takes that blank line as EOF, but I don't see how to tell it to keep checking.

Is there a way to make it do that? Is there a better way to accomplish what I want?


def get_labels(filename):
    index = 0    
    with open(filename, 'rb') as f:    
        dialect = csv.Sniffer().sniff(f.read())        
        f.seek(0)        
        reader = csv.reader(f, dialect)
        for row in reader:
            if 'TimeStamp (s)' not in row:
                index += 1
            else:
                return row, index + 1

Update: I'm trying to figure out the strip function, but I think this is clunky and not the way to go. Here's what I've tried so far:

def strip(filename):
    with open(otherfile, 'wb') as o:    
        with open(filename, 'rb') as f:
            for line in f:
                if line == '\n':
                    continue
                else:
                    o.write(line)
    f.close()
    o.close()    
    return o    

Upvotes: 0

Views: 4051

Answers (1)

tdelaney
tdelaney

Reputation: 77337

The quick way to solve the problem is a second function that strips empty lines. You can use itertools.ifilter to do the job:

import itertools

def get_labels(filename):
    index = 0    
    with open(filename, 'rb') as f:
        sample = ''.join(x[0] for x in zip(itertools.ifilter(strip, f), range(4)))
        dialect = csv.Sniffer().sniff(sample)        
        f.seek(0)        
        reader = csv.reader(itertools.ifilter(strip, f), dialect)
        for row in reader:
            if 'TimeStamp (s)' not in row:
                index += 1
            else:
                return row, index + 1

You could write your own strip function instead of using filter:

def strip_lines(iterable, maxlines=None):
    for i, line in enumerate(iterable):
        if line.strip() and (maxlines is None or maxlines > i):
            yield line

def get_labels(filename):
    index = 0    
    with open(filename, 'rb') as f:
        dialect = csv.Sniffer().sniff(''.join(strip_lines(f, 4))
        f.seek(0)        
        reader = csv.reader(strip_lines(f), dialect)
        for row in reader:
            if 'TimeStamp (s)' not in row:
                index += 1
            else:
                return row, index + 1

Upvotes: 1

Related Questions