Reputation: 3655
I am trying to index a dataframe using a boolean Series similar to here
In [1]: import pandas as pd
In [2]: idx = pd.Index(["USD.CAD", "AUD.NZD", "EUR.USD", "GBP.USD"],
...: name="Currency Pair")
In [3]: pairs = pd.DataFrame({"mean":[3.6,5.1,3.6,2.7], "count":[1,5,8,2]}, index=idx)
In [4]: mask = pairs.reset_index().loc[:,"Currency Pair"].str.contains("USD")
In [5]: pairs.reset_index()[mask]
Out[5]:
Currency Pair count mean
0 USD.CAD 1 3.6
2 EUR.USD 8 3.6
3 GBP.USD 2 2.7
The above works as expected however when I try with the original dataframe without the index reset I get the following error
In [6]: pairs[mask]
C:\Anaconda\lib\site-packages\pandas\core\frame.py:1808: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
"DataFrame index.", UserWarning)
---------------------------------------------------------------------------
IndexingError Traceback (most recent call last)
<ipython-input-6-9eca5ffbdaf7> in <module>()
----> 1 pairs[mask]
C:\Anaconda\lib\site-packages\pandas\core\frame.pyc in __getitem__(self, key)
1772 if isinstance(key, (Series, np.ndarray, Index, list)):
1773 # either boolean or fancy integer index
-> 1774 return self._getitem_array(key)
1775 elif isinstance(key, DataFrame):
1776 return self._getitem_frame(key)
C:\Anaconda\lib\site-packages\pandas\core\frame.pyc in _getitem_array(self, key)
1812 # _check_bool_indexer will throw exception if Series key cannot
1813 # be reindexed to match DataFrame rows
-> 1814 key = _check_bool_indexer(self.index, key)
1815 indexer = key.nonzero()[0]
1816 return self.take(indexer, axis=0, convert=False)
C:\Anaconda\lib\site-packages\pandas\core\indexing.pyc in _check_bool_indexer(ax, key)
1637 mask = com.isnull(result.values)
1638 if mask.any():
-> 1639 raise IndexingError('Unalignable boolean Series key provided')
1640
1641 result = result.astype(bool).values
IndexingError: Unalignable boolean Series key provided
I am confused by this error since my impression was this was an error received if the boolean index length differed from that of the dataframe? Which is not the case as can be seen below.
In [7]: len(mask)
Out[7]: 4
In [8]: len(pairs)
Out[8]: 4
In [9]: len(pairs.reset_index())
Out[9]: 4
Upvotes: 1
Views: 11624
Reputation: 3513
You could use a mask generated from the index directly.
In [22]: mask = pairs.index.str.contains("USD")
In [23]: pairs[mask]
Out[23]:
count mean
Currency Pair
USD.CAD 1 3.6
EUR.USD 8 3.6
GBP.USD 2 2.7
No need to reindex anything.
Upvotes: 2
Reputation: 3655
I figured I would put down the solution @EdChum indicated in the comments. The issue as he indicated was that the mask.index does not agree with pairs.index. Replacing the index of mask with the index from pairs we get the expected behaviour.
In[10]: mask.index = pairs.index.copy()
In[11]: pairs[mask]
Out[11]:
count mean
Currency Pair
USD.CAD 1 3.6
EUR.USD 8 3.6
GBP.USD 2 2.7
Upvotes: 4