Mad Physicist
Mad Physicist

Reputation: 114230

Apply boolean mask only to indexed portion of a dataframe column

I have a dataframe with some columns:

>>> np.random.seed(0xFEE7)
>>> df = pd.DataFrame({'A': np.random.randint(10, size=10), 
                       'B': np.random.randint(10, size=10),
                       'C': np.random.choice(['A', 'B'], size=10)})
>>> df
   A  B  C
0  0  0  B
1  4  0  B
2  6  6  A
3  8  3  B
4  0  2  A
5  8  4  A
6  4  1  B
7  8  7  A
8  4  4  A
9  1  1  A

I also have a boolean series that matches part of the index of df:

>>> g = df.groupby('C').get_group('A')
>>> ser = g['B'] > 5
>>> ser
2     True
4    False
5    False
7     True
8    False
9    False
Name: B, dtype: bool

I'd like to be able to use ser to set or extract data from df. For example:

>>> df.loc[ser, 'A'] -= 3
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\jfoxrabinovitz\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexing.py", line 1762, in __getitem__
    return self._getitem_tuple(key)
  File "C:\Users\jfoxrabinovitz\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexing.py", line 1289, in _getitem_tuple
    retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
  File "C:\Users\jfoxrabinovitz\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexing.py", line 1914, in _getitem_axis
    return self._getbool_axis(key, axis=axis)
  File "C:\Users\jfoxrabinovitz\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexing.py", line 1782, in _getbool_axis
    key = check_bool_indexer(labels, key)
  File "C:\Users\jfoxrabinovitz\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexing.py", line 2317, in check_bool_indexer
    raise IndexingError(
pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

The error makes sense since ser is not the same length as df. How do I tell the dataframe to update the rows that match the index of ser and are set to True?

Specifically, I am looking to modify entries at indices 2 and 7 only:

>>> df   # after modification
   A  B  C
0  0  0  B
1  4  0  B
2  3  6  A
3  8  3  B
4  0  2  A
5  8  4  A
6  4  1  B
7  5  7  A
8  4  4  A
9  1  1  A

Upvotes: 3

Views: 763

Answers (2)

anky
anky

Reputation: 75080

Since the index of ser doesnot match with the original dataframe, you get that error.

You can solve it 2 ways:

either use series.reindex with a fill_value of False (boolean) and then use loc so the indexes are aligned.

df.loc[ser.reindex(df.index,fill_value=False),'A'] = ... #setvalue

Or you can boolean index the ser series so it returns only the True values and gran the index which you can use with loc:

df.loc[ser[ser].index,'A'] = ... #setvalue

Upvotes: 4

viniciusrf1992
viniciusrf1992

Reputation: 313

I guess you could just add index to ser inside the loc since both come from a common index.

df.loc[ser.index, 'A'] -= 3

As commented by @Shubham Sharma, the OP required to filter only the True values. This approach get all indexes wih 'A'.

@anky provided a way for that as:

df.loc[ser[ser].index, 'A'] -= 3

Upvotes: 2

Related Questions