Vash
Vash

Reputation: 1787

'DataFrame' object has no attribute 'col_name'

I read a csv file using

x = pd.read_table('path to csv')

and I can see a row-wise comma-separated list of the data values on printing x which is fine. But when I try to access any column using x.col1, it gives an error :

**AttributeError: 'DataFrame' object has no attribute 'col1'**

I also tried doing :

y = DataFrame(x)

and retrieve the column via y but no luck. However, the command x.columns works. Just can't figure what is the problem here.

Please help!!

Upvotes: 1

Views: 13381

Answers (3)

Qianru Zhou
Qianru Zhou

Reputation: 31

I have the same issue, and have checked all the answers (including the first answer), but none work for me, until I ran

 print(dataset.columns.tolist())

then I found the devil:

['\xef\xbb\xbfLabel', 'blabla','blabla']

Notice the first element of the row, it should be 'Label' (by the way, it seems Pandas do not welcome 'Label' as your name of label, so I changed to something else later.)

I did a little digging, and found

the \x actually means that the value is hexadecimal, which is a Byte Order Mark, indicating that the text is Unicode.

Why does it matter to us? You cannot assume the files you read are clean. They might contain extra symbols like this that can throw your scripts off.

in this article

And I tried many ways to get rid of it, and, the most convenient way is... to add an empty ',' before the first column ( I am using csv, that is to add an empty column before the first column in your dataset for the junk only). Thus, the columns turns out to be:

['\xef\xbb\xbf', 'Label', 'blabla', 'blabla']

Problem solved!

Upvotes: 0

jezrael
jezrael

Reputation: 863741

I think read_table have default separator tab, so is necessary define separator parameter:

x = pd.read_table('path to csv', sep=',')

Or use read_csv with default separator ,, so sep: can be omit.

x = pd.read_csv('path to csv')

Upvotes: 1

Mohamed Ali JAMAOUI
Mohamed Ali JAMAOUI

Reputation: 14699

Try to strip the potential whitespaces around the column name with this:

x.columns = [col.strip() for col in x.columns.tolist()]

Or as suggested in the documenation here and highlighted in @jezrael's answer:

x.columns = x.columns.str.strip() 

Then, you will be able to access columns with x.col1..x.coln. Also be aware that column names are case sensitive.

Example:

>>> import pandas as pd 
>>> df = pd.DataFrame([[1,2],[3,4]], columns=[' col1', 'col2 '])
>>> df
    col1  col2 
0      1      2
1      3      4
>>> df.col1
Traceback (most recent call last):
..    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'col1'
>>> df.col2 
Traceback (most recent call last):
...    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'col2'
>>> df.columns = [col.strip() for col in df.columns.tolist()]
>>> df.col1
0    1
1    3
Name: col1, dtype: int64
>>> df.col2 
0    2
1    4
Name: col2, dtype: int64
>>> 

Upvotes: 0

Related Questions