Reputation: 1624
I have a text file with a list of strings.
I want to search a .csv file for rows that begin with those strings and put them in a new .csv file.
In this instance, the text file is called 'output.txt', the original .csv is 'input.csv' and the new .csv file is 'corrected.csv'.
The code:
import csv
file = open('output.txt')
while 1:
line = file.readline()
writer = csv.writer(open('corrected.csv','wb'), dialect = 'excel')
for row in csv.reader('input.csv'):
if not row[0].startswith(line):
writer.writerow(row)
writer.close()
if not line:
break
pass
The error:
Traceback (most recent call last):
File "C:\Python32\Sample Program\csvParser.py", line 9, in <module>
writer.writerow(row)
TypeError: 'str' does not support the buffer interface`
New error:
Traceback (most recent call last):
File "C:\Python32\Sample Program\csvParser.py", line 12, in <module>
for row in reader:
_csv.Error: line contains NULL byte
Problem was that the CSV file was saved with tabs instead of commas, new issue now is the following:
Traceback (most recent call last):
File "C:\Python32\Sample Program\csvParser.py", line 13, in <module>
if row[0] not in lines:
IndexError: list index out of range
The CSV file has 500+ entries of data... does this make a difference?
Upvotes: 1
Views: 5245
Reputation: 82924
Your latest problem:
if row[0] not in lines:
IndexError: list index out of range
The error message mentions a list index.
There is only one list index that it could be talking about: 0
If 0
is out of range, then len(row)
must be zero.
If len(row)
is zero, then the corresponding line in the input file must be empty.
If a line in the input file is empty, what do you want to do:
(a) ignore the input line altogether?
(b) raise a (fatal) error?
(c) log an error message somewher and keep going?
(d) something else?
Upvotes: 0
Reputation: 14900
The csv.reader
can't open a file, it takes a file object. A better solution would be this:
import csv
lines = []
with open('output.txt', 'r') as f:
for line in f.readlines():
lines.append(line[:-1])
with open('corrected.csv','w') as correct:
writer = csv.writer(correct, dialect = 'excel')
with open('input.csv', 'r') as mycsv:
reader = csv.reader(mycsv)
for row in reader:
if row[0] not in lines:
writer.writerow(row)
Upvotes: 2
Reputation: 298046
If you look at the documentation, this is how the reader
is initialized:
spamReader = csv.reader(open('eggs.csv', 'r'), ...
Notice the open('eggs.csv, 'rb')
. You aren't passing a file
handle in line 9
, so the str
is being treated as a file handle and is throwing you the error.
Replace line 9
with this:
csv.reader(open('input.csv', 'r', newline = ''))
Upvotes: 6
Reputation: 69
Try this
import csv
import cStringIO
file = open('output.txt')
while True:
line = file.readline()
buf = cStringIO.StringIO()
writer = csv.writer(buf, dialect = 'excel')
for row in csv.reader(open('input.csv')):
if not row[0].startswith(line):
writer.writerow(row)
writer.close()
output = open('corrected.csv', 'wb')
output.write(buf.getvalue())
if not line:
break
pass
In my experience, using a cStringIO
buffer for the whole process and then dumping the entire buffer into a file is faster.
Upvotes: -2