Reputation: 88
With the following data, using the code snippet, I am getting the following error. Can you please help me with this. I am a beginner in python. Data :
"Id","Title","Body","Tags"
"Id1","Tit,le1","Body1","Ta,gs1"
"Id","Title","Body","Ta,2gs"
Code:
#!/usr/bin/python
import csv,sys
if len(sys.argv) <> 3:
print >>sys.stderr, 'Wrong number of arguments. This tool will print first n records from a comma separated CSV file.'
print >>sys.stderr, 'Usage:'
print >>sys.stderr, ' python', sys.argv[0], '<file> <number-of-lines>'
sys.exit(1)
fileName = sys.argv[1]
n = int(sys.argv[2])
i = 0
out = csv.writer(sys.stdout, delimiter=',', quotechar='"', quoting=csv.QUOTE_NONNUMERIC)
ret = []
def read_csv(file_path, has_header = True):
with open(file_path) as f:
if has_header: f.readline()
data = []
for line in f:
line = line.strip().split("\",\"")
data.append([x for x in line])
return data
ret = read_csv(fileName)
target = []
train = []
target = [x[2] for x in ret]
train = [x[1] for x in ret]
Error:
target = [x[2] for x in ret]
IndexError: list index out of range
Upvotes: 0
Views: 1491
Reputation: 1121176
You are mixing file.readline()
and using the file object as an iterable. Don't do that. Use next()
instead.
You also should use the csv.reader()
module to read your data, don't reinvent this wheel. The csv
module can handle quoted CSV values with delimiters embedded in thevalues much better in any case:
import csv
def read_csv(file_path, has_header=True):
with open(file_path, 'rb') as f:
reader = csv.reader(f)
if has_header: next(reader, None)
return list(reader)
Last but not least, you can use zip()
to transpose rows and columns:
ret = read_csv(fileName)
target, train = zip(*ret)[1:3] # just the 2nd and 3rd columns
Here zip()
will stop at the first row where there are not enough columns, at the very least avoiding the exception you see.
If there are columns missing in some of the rows, use itertools.izip_longest()
instead (itertools.zip_longest()
in Python 3):
from itertools import izip_longest
ret = read_csv(fileName)
target, train = izip_longest(*ret)[1:3] # just the 2nd and 3rd columns
The default is to replace missing columns with None
; if you need to use a different value, pass a fillvalue
argument to izip_longest()
:
target, train = izip_longest(*ret, fillvalue=0)[1:3] # just the 2nd and 3rd columns
Upvotes: 3