Reputation: 1787
I read a csv file using
x = pd.read_table('path to csv')
and I can see a row-wise comma-separated list of the data values on printing x which is fine. But when I try to access any column using x.col1, it gives an error :
**AttributeError: 'DataFrame' object has no attribute 'col1'**
I also tried doing :
y = DataFrame(x)
and retrieve the column via y but no luck. However, the command x.columns works. Just can't figure what is the problem here.
Please help!!
Upvotes: 1
Views: 13381
Reputation: 31
I have the same issue, and have checked all the answers (including the first answer), but none work for me, until I ran
print(dataset.columns.tolist())
then I found the devil:
['\xef\xbb\xbfLabel', 'blabla','blabla']
Notice the first element of the row, it should be 'Label' (by the way, it seems Pandas do not welcome 'Label' as your name of label, so I changed to something else later.)
I did a little digging, and found
the \x actually means that the value is hexadecimal, which is a Byte Order Mark, indicating that the text is Unicode.
Why does it matter to us? You cannot assume the files you read are clean. They might contain extra symbols like this that can throw your scripts off.
in this article
And I tried many ways to get rid of it, and, the most convenient way is... to add an empty ',' before the first column ( I am using csv, that is to add an empty column before the first column in your dataset for the junk only). Thus, the columns turns out to be:
['\xef\xbb\xbf', 'Label', 'blabla', 'blabla']
Problem solved!
Upvotes: 0
Reputation: 863741
I think read_table
have default separator tab, so is necessary define separator parameter:
x = pd.read_table('path to csv', sep=',')
Or use read_csv
with default separator ,
, so sep
: can be omit.
x = pd.read_csv('path to csv')
Upvotes: 1
Reputation: 14699
Try to strip the potential whitespaces around the column name with this:
x.columns = [col.strip() for col in x.columns.tolist()]
Or as suggested in the documenation here and highlighted in @jezrael's answer:
x.columns = x.columns.str.strip()
Then, you will be able to access columns with x.col1..x.coln
. Also be aware that column names are case sensitive.
>>> import pandas as pd
>>> df = pd.DataFrame([[1,2],[3,4]], columns=[' col1', 'col2 '])
>>> df
col1 col2
0 1 2
1 3 4
>>> df.col1
Traceback (most recent call last):
.. return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'col1'
>>> df.col2
Traceback (most recent call last):
... return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'col2'
>>> df.columns = [col.strip() for col in df.columns.tolist()]
>>> df.col1
0 1
1 3
Name: col1, dtype: int64
>>> df.col2
0 2
1 4
Name: col2, dtype: int64
>>>
Upvotes: 0