Pandas indexer methods and tuples as parameters

Question

Let's say I have a pandas Series, and I want to access a set of elements at specific indices, like so:

In [1]:
from pandas import Series
import numpy as np

s = Series(np.arange(0,10))

In [2]: s.loc[[3,7]]

Out[2]:
3    3
7    7
dtype: int64

The .loc method accepts a list as the parameter for this type of selection. The .iloc and .ix methods work the same way.

However, if I use a tuple for the parameter, both .loc and .iloc fail:

In [5]: s.loc[(3,7)]
---------------------------------------------------------------------------
IndexingError                             Traceback (most recent call last)
........
IndexingError: Too many indexers

In [6]: s.iloc[(3,7)]
---------------------------------------------------------------------------
IndexingError                             Traceback (most recent call last)
........

IndexingError: Too many indexers

And .ix produces a strange result:

In [7]: s.ix[(3,7)]
Out[7]: 3

Now, I get that you can't even do this with a raw python list:

In [27]:
x = list(range(0,10))
x[(3,7)]

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
 in ()
      1 x = list(range(0,10))
----> 2 x[(3,7)]

TypeError: list indices must be integers or slices, not tuple

To retrieve a set of specific indices from a list, you need to use a comprehension, as explained here.

But on the other hand, using a tuple to select rows from a pandas DataFrame seems to work fine for all three indexing methods. Here's an example with the .loc method:

In [8]:
from pandas import DataFrame
df = DataFrame({"x" : np.arange(0,10)})

In [9]:
df.loc[(3,7),"x"]

Out[9]:
3    3
7    7
Name: x, dtype: int64

My three questions are:

Why won't the Series indexers accept a tuple? It would seem
natural to use a tuple since the set of desired indices is an
immutable, single-use parameter. Is this solely for the purpose of mimicking the list interface?
What is the explanation for the strange Series .ix result?
Why the inconsistency between Series and DataFrame on this matter?

JoeCondron · Accepted Answer

I think the answer to first question is that tuples are used to locate in a MultiIndex. I don't think there are good answers to the second two questions except that you've exposed a bug and an inconsistency, respectively, in the code (This isn't that hard to do :)). So the Series complains because you don't have a MultiIndex or, more generally, that the length of the tuple is greater than the number of levels in your index. The DataFrame should probably react in the same way but doesn't. I think the safest way to proceed is to reserve tuples for MultiIndex and to use lists/arrays/series for indexing multiple rows. As a side note, you would use a list/array of tuples to select multiple rows in a MultiIndex.

Pandas indexer methods and tuples as parameters

Answers (2)

Related Questions