Reputation: 10843
Let's say I have a pandas Series
, and I want to access a set of elements at specific indices, like so:
In [1]:
from pandas import Series
import numpy as np
s = Series(np.arange(0,10))
In [2]: s.loc[[3,7]]
Out[2]:
3 3
7 7
dtype: int64
The .loc
method accepts a list
as the parameter for this type of selection. The .iloc
and .ix
methods work the same way.
However, if I use a tuple
for the parameter, both .loc
and .iloc
fail:
In [5]: s.loc[(3,7)]
---------------------------------------------------------------------------
IndexingError Traceback (most recent call last)
........
IndexingError: Too many indexers
In [6]: s.iloc[(3,7)]
---------------------------------------------------------------------------
IndexingError Traceback (most recent call last)
........
IndexingError: Too many indexers
And .ix
produces a strange result:
In [7]: s.ix[(3,7)]
Out[7]: 3
Now, I get that you can't even do this with a raw python list
:
In [27]:
x = list(range(0,10))
x[(3,7)]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-27-cefdde088328> in <module>()
1 x = list(range(0,10))
----> 2 x[(3,7)]
TypeError: list indices must be integers or slices, not tuple
To retrieve a set of specific indices from a list
, you need to use a comprehension, as explained here.
But on the other hand, using a tuple
to select rows from a pandas DataFrame
seems to work fine for all three indexing methods. Here's an example with the .loc
method:
In [8]:
from pandas import DataFrame
df = DataFrame({"x" : np.arange(0,10)})
In [9]:
df.loc[(3,7),"x"]
Out[9]:
3 3
7 7
Name: x, dtype: int64
My three questions are:
Series
indexers accept a tuple
? It would seemtuple
since the set of desired indices is anlist
interface?Series
.ix
result?Series
and DataFrame
on this matter?Upvotes: 2
Views: 2260
Reputation: 30424
It's hard to answer this in a systematic way, so I'll just answer list-style:
()
instead of []
when []
is the standard way?df.loc
worked here and s.loc
didn't. Neither was guaranteed to work here (according to the documentation), but df.loc
happened to. Furthermore, it is quite possible df.loc
would stop working like this in a future version. loc/iloc/ix
not working as shown in the documentation, that should be pointed out and reported as a bug. I don't believe any of the above fall into that category but I could certainly be wrong about that.Upvotes: 2
Reputation: 8906
I think the answer to first question is that tuples
are used to locate in a MultiIndex
. I don't think there are good answers to the second two questions except that you've exposed a bug and an inconsistency, respectively, in the code (This isn't that hard to do :)).
So the Series
complains because you don't have a
MultiIndex
or, more generally, that the length of the tuple is greater than the number of levels in your index.
The DataFrame
should probably react in the same way but doesn't.
I think the safest way to proceed is to reserve tuples
for MultiIndex
and to use lists/arrays/series for indexing multiple rows.
As a side note, you would use a list/array of tuples to select multiple rows in a MultiIndex
.
Upvotes: 2