Reputation: 27
Before anyone marks this as duplicate, I have tried everything from isspace, startswith, itertools filterfunction, readlines()[2:]. I have a Python script that searches hundreds of CSV files and prints the row with the matching string (in this case a unique ID) in the eighth column from the left.
import csv
import glob
csvfiles = glob.glob('20??-??-??.csv')
for filename in csvfiles:
reader = csv.reader(open(csvfiles))
for row in reader:
col8 = str(row[8])
if col8 == '36862210':
print row
The code works with test .csv files. However, the real .csv files I'm working with all have blank first two rows. And I am getting this error message.
IndexError: list index out of range
Here's my latest code:
import csv
import glob
csvfiles = glob.glob('20??-??-??.csv')
for filename in csvfiles:
reader = csv.reader(open(csvfiles))
for row in reader:
if not row:
continue
col8 = str(row[8])
if col8 == '36862210':
print row
Upvotes: 0
Views: 5008
Reputation: 7873
Try to skip the first two row using next
instead:
import csv
import glob
csvfiles = glob.glob('20??-??-??.csv')
for filename in csvfiles:
reader = csv.reader(open(filename))
next(reader)
next(reader)
for row in reader:
col8 = str(row[8])
if col8 == '36862210':
print row
Upvotes: 3
Reputation: 19769
A csv reader takes an iterable, which can be a file object but need not be.
You can create a generator that removes all blank lines from a file like so:
csvfile = open(filename)
filtered_csv = (line for line in csvfile if not line.isspace())
This filtered_csv
generator will lazily pull one line at a time from your file object, and skip to the next one if the line is entirely whitespace.
You should be able to write your code like:
for filename in csvfiles:
csvfile = open(filename)
filtered_csv = (line for line in csvfile if not line.isspace())
reader = csv.reader(filtered_csv)
for row in reader:
col8 = str(row[8])
if col8 == '36862210':
print row
Assuming the non-blank rows are well formed, ie, all have an 8th index, you should not get an IndexError
.
EDIT: If you're still encountering an IndexError
it probably is not because of a line consisting of only whitespace. Catch the exception and look at the row:
try:
col8 = str(row[8])
if col8 == '36862210':
print row
except IndexError:
pass
to examine the output from the CSV reader that's actually causing the error. If the row is an object that doesn't print its contents, do instead print list(row)
.
Upvotes: 0