adelaide01
adelaide01

Reputation: 27

Python csv skip first two empty rows

Before anyone marks this as duplicate, I have tried everything from isspace, startswith, itertools filterfunction, readlines()[2:]. I have a Python script that searches hundreds of CSV files and prints the row with the matching string (in this case a unique ID) in the eighth column from the left.

import csv
import glob

csvfiles = glob.glob('20??-??-??.csv')
for filename in csvfiles:
    reader = csv.reader(open(csvfiles))
    for row in reader:
        col8 = str(row[8])
        if col8 == '36862210':
            print row

The code works with test .csv files. However, the real .csv files I'm working with all have blank first two rows. And I am getting this error message.

IndexError: list index out of range

Here's my latest code:

import csv
import glob

csvfiles = glob.glob('20??-??-??.csv')
for filename in csvfiles:
    reader = csv.reader(open(csvfiles))
    for row in reader:
        if not row:
            continue
        col8 = str(row[8])
        if col8 == '36862210':
            print row

Upvotes: 0

Views: 5008

Answers (2)

amirouche
amirouche

Reputation: 7873

Try to skip the first two row using next instead:

import csv
import glob

csvfiles = glob.glob('20??-??-??.csv')
for filename in csvfiles:
    reader = csv.reader(open(filename))
    next(reader)
    next(reader)
    for row in reader:
        col8 = str(row[8])
        if col8 == '36862210':
            print row

Upvotes: 3

Matt Anderson
Matt Anderson

Reputation: 19769

A csv reader takes an iterable, which can be a file object but need not be.

You can create a generator that removes all blank lines from a file like so:

csvfile = open(filename)
filtered_csv = (line for line in csvfile if not line.isspace())

This filtered_csv generator will lazily pull one line at a time from your file object, and skip to the next one if the line is entirely whitespace.

You should be able to write your code like:

for filename in csvfiles:
    csvfile = open(filename)
    filtered_csv = (line for line in csvfile if not line.isspace())
    reader = csv.reader(filtered_csv)
    for row in reader:
        col8 = str(row[8])
        if col8 == '36862210':
            print row

Assuming the non-blank rows are well formed, ie, all have an 8th index, you should not get an IndexError.

EDIT: If you're still encountering an IndexError it probably is not because of a line consisting of only whitespace. Catch the exception and look at the row:

try:
    col8 = str(row[8])
    if col8 == '36862210':
        print row
except IndexError:
    pass

to examine the output from the CSV reader that's actually causing the error. If the row is an object that doesn't print its contents, do instead print list(row).

Upvotes: 0

Related Questions