Reputation: 13571
I have a pandas Series with the following content.
$ import pandas as pd
$ filter = pd.Series(
data = [True, False, True, True],
index = ['A', 'B', 'C', 'D']
)
$ filter.index.name = 'my_id'
$ print(filter)
my_id
A True
B False
C True
D True
dtype: bool
and a DataFrame like this.
$ df = pd.DataFrame({
'A': [1, 2, 9, 4],
'B': [9, 6, 7, 8],
'C': [10, 91, 32, 13],
'D': [43, 12, 7, 9],
'E': [65, 12, 3, 8]
})
$ print(df)
A B C D E
0 1 9 10 43 65
1 2 6 91 12 12
2 9 7 32 7 3
3 4 8 13 9 8
filter
has A
, B
, C
, and D
as its indices. df
has A
, B
, C
, D
, and E
as it column names.
True
in filter
means that the corresponding column in df
will be preserved. False
in filter
means that the corresponding column in df
will be removed. Column E
in df
should be removed because filter
doesn't contain E
.
How can I generate another DataFrame with column B
, and E
removed using filter
?
I mean I want to create the following DataFrame using filter
and df
.
A C D
0 1 10 43
1 2 91 12
2 9 32 7
3 4 13 9
df.loc[:, filter]
generates the following error.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/username/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 1494, in __getitem__
return self._getitem_tuple(key)
File "/Users/username/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 888, in _getitem_tuple
retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
File "/Users/username/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 1869, in _getitem_axis
return self._getbool_axis(key, axis=axis)
File "/Users/username/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 1515, in _getbool_axis
key = check_bool_indexer(labels, key)
File "/Users/username/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 2486, in check_bool_indexer
raise IndexingError('Unalignable boolean Series provided as '
pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match
df.loc[:, filter]
works if df
doesn't contain column E
.
The real length of the DataFrame (len(df.columns)
) I encountered in my case contains about 2000 columns. And the length of the Series (len(filter)
) is about 1999. This makes me difficult to determine which elements are in df
but not in filter
.
Upvotes: 1
Views: 2712
Reputation: 1076
This should give you what you need:
df.loc[:, filter[filter].index]
Explanation: You select the rows in filter
which contain True
and take their index
labels to pick the columns from df
.
You cannot use the boolean values in filter
directly because it contains fewer values than there are columns in df
.
Upvotes: 2