mgilbert
mgilbert

Reputation: 3655

IndexingError using Boolean Indexing

I am trying to index a dataframe using a boolean Series similar to here

In [1]: import pandas as pd
In [2]: idx = pd.Index(["USD.CAD", "AUD.NZD", "EUR.USD", "GBP.USD"],
   ...:                name="Currency Pair")
In [3]: pairs = pd.DataFrame({"mean":[3.6,5.1,3.6,2.7], "count":[1,5,8,2]}, index=idx)
In [4]: mask = pairs.reset_index().loc[:,"Currency Pair"].str.contains("USD")

In [5]: pairs.reset_index()[mask]
Out[5]: 
  Currency Pair  count  mean
0       USD.CAD      1   3.6
2       EUR.USD      8   3.6
3       GBP.USD      2   2.7

The above works as expected however when I try with the original dataframe without the index reset I get the following error

In [6]: pairs[mask]
C:\Anaconda\lib\site-packages\pandas\core\frame.py:1808: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  "DataFrame index.", UserWarning)
---------------------------------------------------------------------------
IndexingError                             Traceback (most recent call last)
<ipython-input-6-9eca5ffbdaf7> in <module>()
----> 1 pairs[mask]

C:\Anaconda\lib\site-packages\pandas\core\frame.pyc in __getitem__(self, key)
   1772         if isinstance(key, (Series, np.ndarray, Index, list)):
   1773             # either boolean or fancy integer index
-> 1774             return self._getitem_array(key)
   1775         elif isinstance(key, DataFrame):
   1776             return self._getitem_frame(key)

C:\Anaconda\lib\site-packages\pandas\core\frame.pyc in _getitem_array(self, key)
   1812             # _check_bool_indexer will throw exception if Series key cannot
   1813             # be reindexed to match DataFrame rows
-> 1814             key = _check_bool_indexer(self.index, key)
   1815             indexer = key.nonzero()[0]
   1816             return self.take(indexer, axis=0, convert=False)

C:\Anaconda\lib\site-packages\pandas\core\indexing.pyc in _check_bool_indexer(ax, key)
   1637         mask = com.isnull(result.values)
   1638         if mask.any():
-> 1639             raise IndexingError('Unalignable boolean Series key provided')
   1640 
   1641         result = result.astype(bool).values

IndexingError: Unalignable boolean Series key provided

I am confused by this error since my impression was this was an error received if the boolean index length differed from that of the dataframe? Which is not the case as can be seen below.

In [7]: len(mask)
Out[7]: 4
In [8]: len(pairs)
Out[8]: 4
In [9]: len(pairs.reset_index())
Out[9]: 4

Upvotes: 1

Views: 11624

Answers (2)

Rob
Rob

Reputation: 3513

You could use a mask generated from the index directly.

In [22]: mask = pairs.index.str.contains("USD")
In [23]: pairs[mask]
Out[23]: 
               count  mean
Currency Pair             
USD.CAD            1   3.6
EUR.USD            8   3.6
GBP.USD            2   2.7

No need to reindex anything.

Upvotes: 2

mgilbert
mgilbert

Reputation: 3655

I figured I would put down the solution @EdChum indicated in the comments. The issue as he indicated was that the mask.index does not agree with pairs.index. Replacing the index of mask with the index from pairs we get the expected behaviour.

In[10]: mask.index = pairs.index.copy()
In[11]: pairs[mask]
Out[11]: 
               count  mean
Currency Pair             
USD.CAD            1   3.6
EUR.USD            8   3.6
GBP.USD            2   2.7

Upvotes: 4

Related Questions