Brian
Brian

Reputation: 13571

Pandas filter DataFrame with Series

I have a pandas Series with the following content.

$ import pandas as pd
$ filter = pd.Series(
    data = [True, False, True, True],
    index = ['A', 'B', 'C', 'D']
    )
$ filter.index.name = 'my_id'

$ print(filter)

my_id
A     True
B    False
C     True
D     True
dtype: bool

and a DataFrame like this.

$ df = pd.DataFrame({
    'A': [1, 2, 9, 4],
    'B': [9, 6, 7, 8],
    'C': [10, 91, 32, 13],
    'D': [43, 12, 7, 9],
    'E': [65, 12, 3, 8]
})

$ print(df)

   A  B   C   D   E
0  1  9  10  43  65
1  2  6  91  12  12
2  9  7  32   7   3
3  4  8  13   9   8

filter has A, B, C, and D as its indices. df has A, B, C, D, and E as it column names.

True in filter means that the corresponding column in df will be preserved. False in filter means that the corresponding column in df will be removed. Column E in df should be removed because filter doesn't contain E.

How can I generate another DataFrame with column B, and E removed using filter?

I mean I want to create the following DataFrame using filter and df.

   A   C   D
0  1  10  43
1  2  91  12
2  9  32   7
3  4  13   9

df.loc[:, filter] generates the following error.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/username/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 1494, in __getitem__
    return self._getitem_tuple(key)
  File "/Users/username/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 888, in _getitem_tuple
    retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
  File "/Users/username/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 1869, in _getitem_axis
    return self._getbool_axis(key, axis=axis)
  File "/Users/username/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 1515, in _getbool_axis
    key = check_bool_indexer(labels, key)
  File "/Users/username/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 2486, in check_bool_indexer
    raise IndexingError('Unalignable boolean Series provided as '
pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match

df.loc[:, filter] works if df doesn't contain column E.

The real length of the DataFrame (len(df.columns)) I encountered in my case contains about 2000 columns. And the length of the Series (len(filter)) is about 1999. This makes me difficult to determine which elements are in df but not in filter.

Upvotes: 1

Views: 2712

Answers (1)

mrhd
mrhd

Reputation: 1076

This should give you what you need:

df.loc[:, filter[filter].index]

Explanation: You select the rows in filter which contain True and take their index labels to pick the columns from df.

You cannot use the boolean values in filter directly because it contains fewer values than there are columns in df.

Upvotes: 2

Related Questions