jab
jab

Reputation: 5823

Pandas set_index Multiindex Lookup

I cannot find a way to lookup a multiindex in Pandas 0.14. Here is some mock data that I'm having trouble with.

Code:

row1 = ['red', 'ferrari', 'mine']
row2 = ['blue', 'ferrari', 'his']
row3 = ['red', 'lambo', 'his']
row4 = ['yellow', 'porsche', 'his']
row5 = ['yellow', 'lambo', 'his']
all_dat = [row1, row2, row3, row4, row5]
df = DataFrame(all_dat, columns=['Color', 'Make', 'Ownership'])

print df
df = df.set_index(['Color', 'Make'])
print df

print df['red']['lambo']
print df['yellow']['porsche']

Output:

    Color     Make Ownership
0     red  ferrari      mine
1    blue  ferrari       his
2     red    lambo       his
3  yellow  porsche       his
4  yellow    lambo       his
               Ownership
Color  Make             
red    ferrari      mine
blue   ferrari       his
red    lambo         his
yellow porsche       his
       lambo         his

Traceback (most recent call last):
    print df['red']['lambo']
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 1678, in __getitem__
    return self._getitem_column(key)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 1685, in _getitem_column
    return self._get_item_cache(key)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 1052, in _get_item_cache
    values = self._data.get(item)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 2565, in get
    loc = self.items.get_loc(item)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/index.py", line 1181, in get_loc
    return self._engine.get_loc(_values_from_object(key))
  File "index.pyx", line 129, in pandas.index.IndexEngine.get_loc (pandas/index.c:3354)
  File "index.pyx", line 149, in pandas.index.IndexEngine.get_loc (pandas/index.c:3234)
  File "hashtable.pyx", line 696, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:11148)
  File "hashtable.pyx", line 704, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:11101)
KeyError: 'red'

I have tried lookups using

df[('red', 'lambo')]

and

df['red', 'lambo']

These had similar results (KeyErrors).

So, is there some kind of step I'm missing here when setting a multiindex? I want to use set_index() as my real data (this is just mock data) has many operations performed on it before it gets to the point where I redefine indices.

Upvotes: 2

Views: 2997

Answers (1)

unutbu
unutbu

Reputation: 880717

Using df.loc, you can specify the desired labels as a list of tuples:

In [99]: df.loc[[('red','lambo')]]
Out[99]: 
            Ownership
Color Make           
red   lambo       his

In [106]: df.loc[[('yellow','porsche'), ('red','lambo')]]
Out[106]: 
               Ownership
Color  Make             
yellow porsche       his
red    lambo         his

Assignments can be made like this:

In [117]: df.loc[[('red', 'lambo')], 'Ownership'] = 'mine'

In [118]: df
Out[118]: 
               Ownership
Color  Make             
red    ferrari      mine
blue   ferrari       his
red    lambo        mine
yellow porsche       his
       lambo         his

See also: Advanced indexing with hierarchical index

Upvotes: 2

Related Questions