Reputation: 88228
I know that I can select multiple columns if I pass a list of the column names. I was surprised to find that I can't pass a tuple of column names.
import pandas as pd
df = pd.DataFrame([[2,3,4],[3,4,5]],columns=['a','b','c'])
print df[['a','b']]
print df[('a','b')] # Key error
Why is this? Is there something important I'm missing? As far as I can tell, the only difference is that the second case, the multi-key is immutable.
Upvotes: 4
Views: 27994
Reputation: 880717
Tuples are also used to select a column from a DataFrame with a MultiIndex:
import pandas as pd
columns = pd.MultiIndex.from_arrays([['a','b','c'], ['X','Y','Z']])
df = pd.DataFrame([[2,3,4],[3,4,5]], columns=columns)
# a b c
# X Y Z
# 0 2 3 4
# 1 3 4 5
then
In [203]: df[('a','X')]
Out[203]:
0 2
1 3
Name: (a, X), dtype: int64
A list of tuples selects multiple columns, with each tuple specifying one column:
In [204]: df[[('a','X'), ('b','Y')]]
Out[204]:
a b
X Y
0 2 3
1 3 4
DataFrame.__getitem__
uses type checking to produce this behavior:
# lists are handled here ----------------------vvvv
if isinstance(key, (Series, np.ndarray, Index, list)):
# either boolean or fancy integer index
return self._getitem_array(key)
elif isinstance(key, DataFrame):
return self._getitem_frame(key)
# tuples are handled here when self has a MultiIndex
elif is_mi_columns:
return self._getitem_multilevel(key)
# or else here
else:
return self._getitem_column(key)
Upvotes: 3
Reputation: 25679
df = pd.DataFrame([[2,3,4],[3,4,5]],columns=[('a','b','c'])
df.columns
#Index([u'a', u'b', u'c'], dtype='object')
You do not have key ('a','b')
, only 'a', 'b', 'c'
keys exits..
Upvotes: 3
Reputation: 863511
You can have column name as tuple
, so then use for selecting tuple
:
import pandas as pd
df = pd.DataFrame([[2,3,4],[3,4,5]],columns=[('a', 'b'),'b','c'])
print (df)
(a, b) b c
0 2 3 4
1 3 4 5
print (df[('a','b')])
0 2
1 3
Name: (a, b), dtype: int64
Upvotes: 4