Hooked
Hooked

Reputation: 88228

Why does passing a tuple cause a key error in pandas?

I know that I can select multiple columns if I pass a list of the column names. I was surprised to find that I can't pass a tuple of column names.

import pandas as pd

df = pd.DataFrame([[2,3,4],[3,4,5]],columns=['a','b','c'])

print df[['a','b']]
print df[('a','b')] # Key error

Why is this? Is there something important I'm missing? As far as I can tell, the only difference is that the second case, the multi-key is immutable.

Upvotes: 4

Views: 27994

Answers (3)

unutbu
unutbu

Reputation: 880717

Tuples are also used to select a column from a DataFrame with a MultiIndex:

import pandas as pd
columns = pd.MultiIndex.from_arrays([['a','b','c'], ['X','Y','Z']])
df = pd.DataFrame([[2,3,4],[3,4,5]], columns=columns)
#    a  b  c
#    X  Y  Z
# 0  2  3  4
# 1  3  4  5

then

In [203]: df[('a','X')]
Out[203]: 
0    2
1    3
Name: (a, X), dtype: int64

A list of tuples selects multiple columns, with each tuple specifying one column:

In [204]: df[[('a','X'), ('b','Y')]]
Out[204]: 
   a  b
   X  Y
0  2  3
1  3  4

DataFrame.__getitem__ uses type checking to produce this behavior:

    # lists are handled here ----------------------vvvv
    if isinstance(key, (Series, np.ndarray, Index, list)):
        # either boolean or fancy integer index
        return self._getitem_array(key)
    elif isinstance(key, DataFrame):
        return self._getitem_frame(key)
    # tuples are handled here when self has a MultiIndex
    elif is_mi_columns:
        return self._getitem_multilevel(key)
    # or else here
    else:
        return self._getitem_column(key)

Upvotes: 3

Merlin
Merlin

Reputation: 25679

df = pd.DataFrame([[2,3,4],[3,4,5]],columns=[('a','b','c'])
df.columns

#Index([u'a', u'b', u'c'], dtype='object')

You do not have key ('a','b'), only 'a', 'b', 'c' keys exits..

Upvotes: 3

jezrael
jezrael

Reputation: 863511

You can have column name as tuple, so then use for selecting tuple:

import pandas as pd

df = pd.DataFrame([[2,3,4],[3,4,5]],columns=[('a', 'b'),'b','c'])
print (df)
   (a, b)  b  c
0       2  3  4
1       3  4  5

print (df[('a','b')])
0    2
1    3
Name: (a, b), dtype: int64

Upvotes: 4

Related Questions