Reputation: 803
I'm using the Pandas library to work with text because I find it far easier than the csv
module. Here's the problem. I have a .csv file with multiple columns: subtitle
, title,
and description
. Here's how I access the row content within each column.
colnames = ['subtitle', 'description', 'title']
data = pandas.read_csv('C:\Users\B\cwitems.csv', names=colnames)
subtit = list(data.subtitle)
desc = list(data.description)
title = list(data.title)
for line in zip(subtit, desc, title):
print line
The issue is that, for whatever reason, when I print line
, the expected subtitle isn't printed. When I print each desc
, the title shows up. And when I print subtit
by itself, the description is printed. Thus, it appears that each column is off by -1. Can anyone explain this behavior? Is it expected and how do I avoid it?
Upvotes: 1
Views: 2265
Reputation: 1
I had a similar problem, turns out the .csv I was trying to download had no comma at the end of the header row, but did have commas at the end of every other row. Passing index_col=False (not index_col=None, the default) forces pandas to create an index column instead of inferring one, which got my data to line up correctly.
Upvotes: -1
Reputation: 6383
I think you were trying to load a file with 4 columns but only gave 3 col names. If you only need to load the first 3 columns, use
data = pandas.read_csv('C:\Users\B\cwitems.csv', names=colnames, usecols=[0,1,2])
You don't have to delete the unused column in the file.
By default, read_csv loads all columns, and in your case #cols = #colnames+1, so the first column is used as dataframe index. All the remaining columns are shifted by 1.
Upvotes: 2
Reputation: 803
It appears that I solved the problem - tho I didn't find this anywhere in the docs, so perhaps a more experienced Pandas users can explain why/how. I certainly cannot.
Here's what I did: I deleted an unused column (the last column in my .csv file), and that reset the indices to their proper/expected order. I have no idea what explains the behavior (or its correction) - whether it's related to my .csv file or whether it's a Pandas thing (and perhaps only a Pandas' issue when working with text). I don't know.
Either way, I really appreciate all of help!! I got lucky this time.
Upvotes: 0
Reputation: 35149
Not sure if this is an answer, But it was too long for the comment. Feel free to ignore it.
>>> from itertools import izip_longest
>>>
>>> l1 = [1,2]
>>> l2 = [1,2,3,4,5]
>>> l3 = [1,2,3]
>>>
>>> for line in izip_longest(l1,l2,l3):
... print line
will print :
(1, 1, 1)
(2, 2, 2)
(None, 3, 3)
(None, 4, None)
(None, 5, None)
Upvotes: 1