alwaysaskingquestions
alwaysaskingquestions

Reputation: 1657

Python pandas.core.indexing.IndexingError: Unalignable boolean Series key provided

So I read in a data table with 29 columns and i added in one index column (so 30 in total).

Data = pd.read_excel(os.path.join(BaseDir, 'test.xlsx'))
Data.reset_index(inplace=True)

and then, i wanted to subset the data to only include the columns whose column name contains "ref" or "Ref"; I got below code from another Stack post:

col_keep = Data.ix[:, pd.Series(Data.columns.values).str.contains('ref', case=False)]

However, I keep getting this error:

    print(len(Data.columns.values))
    30
    print(pd.Series(Data.columns.values).str.contains('ref', case=False))
    0     False
    1     False
    2     False
    3     False
    4     False
    5     False
    6     False
    7     False
    8     False
    9     False
    10    False
    11    False
    12    False
    13    False
    14    False
    15    False
    16    False
    17    False
    18    False
    19    False
    20    False
    21    False
    22    False
    23    False
    24     True
    25     True
    26     True
    27     True
    28    False
    29    False
    dtype: bool

Traceback (most recent call last):
  File "C:/Users/lala.py", line 26, in <module>
    col_keep = FedexData.ix[:, pd.Series(FedexData.columns.values).str.contains('ref', case=False)]
  File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\indexing.py", line 84, in __getitem__
    return self._getitem_tuple(key)
  File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\indexing.py", line 816, in _getitem_tuple
    retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
  File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\indexing.py", line 1014, in _getitem_axis
    return self._getitem_iterable(key, axis=axis)
  File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\indexing.py", line 1041, in _getitem_iterable
    key = check_bool_indexer(labels, key)
  File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\indexing.py", line 1817, in check_bool_indexer
    raise IndexingError('Unalignable boolean Series key provided')
pandas.core.indexing.IndexingError: Unalignable boolean Series key provided

So the boolean values are correct, but why is it not working? why is the error keep popping up?

Any help/hint is appreciated! Thank you so so much.

Upvotes: 6

Views: 26408

Answers (1)

unutbu
unutbu

Reputation: 879143

I can reproduce a similar error message this way:

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.randint(4, size=(10,4)), columns=list('ABCD'))
df.ix[:, pd.Series([True,False,True,False])]

raises (using Pandas version 0.21.0.dev+25.g50e95e0)

pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match

The problem occurs because Pandas is trying to align the index of the Series with the column index of the DataFrame before masking with the Series boolean values. Since df has column labels 'A', 'B', 'C', 'D' and the Series has index labels 0, 1, 2, 3, Pandas is complaining that the labels are unalignable.

You probably don't want any index alignment. So instead, pass a NumPy boolean array instead of a Pandas Series:

mask = pd.Series(Data.columns.values).str.contains('ref', case=False).values
col_keep = Data.loc[:, mask]

The Series.values attribute returns a NumPy array. And since in future versions of Pandas, DataFrame.ix will be removed, use Data.loc instead of Data.ix here since we want boolean indexing.

Upvotes: 11

Related Questions