Reputation: 1657
So I read in a data table with 29 columns and i added in one index column (so 30 in total).
Data = pd.read_excel(os.path.join(BaseDir, 'test.xlsx'))
Data.reset_index(inplace=True)
and then, i wanted to subset the data to only include the columns whose column name contains "ref" or "Ref"; I got below code from another Stack post:
col_keep = Data.ix[:, pd.Series(Data.columns.values).str.contains('ref', case=False)]
However, I keep getting this error:
print(len(Data.columns.values))
30
print(pd.Series(Data.columns.values).str.contains('ref', case=False))
0 False
1 False
2 False
3 False
4 False
5 False
6 False
7 False
8 False
9 False
10 False
11 False
12 False
13 False
14 False
15 False
16 False
17 False
18 False
19 False
20 False
21 False
22 False
23 False
24 True
25 True
26 True
27 True
28 False
29 False
dtype: bool
Traceback (most recent call last):
File "C:/Users/lala.py", line 26, in <module>
col_keep = FedexData.ix[:, pd.Series(FedexData.columns.values).str.contains('ref', case=False)]
File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\indexing.py", line 84, in __getitem__
return self._getitem_tuple(key)
File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\indexing.py", line 816, in _getitem_tuple
retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\indexing.py", line 1014, in _getitem_axis
return self._getitem_iterable(key, axis=axis)
File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\indexing.py", line 1041, in _getitem_iterable
key = check_bool_indexer(labels, key)
File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\indexing.py", line 1817, in check_bool_indexer
raise IndexingError('Unalignable boolean Series key provided')
pandas.core.indexing.IndexingError: Unalignable boolean Series key provided
So the boolean values are correct, but why is it not working? why is the error keep popping up?
Any help/hint is appreciated! Thank you so so much.
Upvotes: 6
Views: 26408
Reputation: 879143
I can reproduce a similar error message this way:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(4, size=(10,4)), columns=list('ABCD'))
df.ix[:, pd.Series([True,False,True,False])]
raises (using Pandas version 0.21.0.dev+25.g50e95e0)
pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match
The problem occurs because Pandas is trying to align the index of the Series
with the column index of the DataFrame before masking with the Series boolean
values. Since df
has column labels 'A', 'B', 'C', 'D'
and the Series has
index labels 0
, 1
, 2
, 3
, Pandas is complaining that the labels are
unalignable.
You probably don't want any index alignment. So instead, pass a NumPy boolean array instead of a Pandas Series:
mask = pd.Series(Data.columns.values).str.contains('ref', case=False).values
col_keep = Data.loc[:, mask]
The Series.values
attribute returns a NumPy array. And since in future versions of Pandas, DataFrame.ix
will be removed, use Data.loc
instead of Data.ix
here since we want boolean indexing.
Upvotes: 11