user3664020
user3664020

Reputation: 3020

Not able to access a column in pandas data frame

I have data frame df.

df.columns gives this output

Index([u'Talk Time\t', u'Hold Time\t', u'Work Time\t', u'Call Type'], dtype='object')

Here, column 'Talk Time' has "\t" character with it, so if I do the following, I get an error

df['Talk Time']

Traceback (most recent call last):

File "<ipython-input-78-f2b7b9f43f59>", line 1, in <module>
old['Talk Time']

File "C:\Users\Admin\Anaconda\lib\site-packages\pandas\core\frame.py", line 1780, in __getitem__
return self._getitem_column(key)

File "C:\Users\Admin\Anaconda\lib\site-packages\pandas\core\frame.py", line 1787, in _getitem_column
return self._get_item_cache(key)

File "C:\Users\Admin\Anaconda\lib\site-packages\pandas\core\generic.py", line 1068, in _get_item_cache
values = self._data.get(item)

File "C:\Users\Admin\Anaconda\lib\site-packages\pandas\core\internals.py", line 2849, in get
loc = self.items.get_loc(item)

File "C:\Users\Admin\Anaconda\lib\site-packages\pandas\core\index.py", line 1402, in get_loc
return self._engine.get_loc(_values_from_object(key))

File "pandas\index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas\index.c:3820)

File "pandas\index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas\index.c:3700)

File "pandas\hashtable.pyx", line 696, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12323)

File "pandas\hashtable.pyx", line 704, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12274)

KeyError: 'Talk Time'

So I modify columns to remove tab characters as follows:

for n in range(len(df.columns)):
df.columns.values[n] = df.columns.values[n].rstrip()

Tab characters get removed, df.columns give the following output

Index([u'Talk Time', u'Hold Time', u'Work Time', u'Call Type'], dtype='object')

But, still when I am trying to access a column as

df['Talk Time']

, I am seeing the same error. Why is it happening?

Upvotes: 1

Views: 1473

Answers (2)

Geeocode
Geeocode

Reputation: 5797

The main issue is, that you replaced the value of the columns and that is you actually managed to do. But that is just an alias, thus the actual name stayed as was before. So df['Talk Time\t'] worked well on, if you tried to, but obviously that wasn't the result you waited for. So the solution is that you have to change the df.columns instead of df.columns.value

df.columns = [c.rstrip() for c in df.columns]

This is what works fine according to your needs

Upvotes: 1

jrjc
jrjc

Reputation: 21873

I can't reproduce your second error, however, you could do:

df.columns = [i.rstrip() for i in df.columns]

Maybe this will help !

Upvotes: 0

Related Questions