Reputation: 1925
I would like mask (or assign 'NA') the value of a column in a dataframe if two conditions are met. This would be relatively straightforward if the conditions were performed row-wise, with something like:
mask = ((df['A'] < x) & (df['B'] < y))
df.loc[mask, 'C'] = 'NA'
but I'm having some trouble figuring out of how to perform this task in my dataframe, which is structured more or less like:
df = pd.DataFrame({ 'A': (188, 750, 1330, 1385, 188, 750, 810, 1330, 1385),
'B': (2, 5, 7, 2, 5, 5, 3, 7, 2),
'C': ('foo', 'foo', 'foo', 'foo', 'bar', 'bar', 'bar', 'bar', 'bar') })
A B C
0 188 2 foo
1 750 5 foo
2 1330 7 foo
3 1385 2 foo
4 188 5 bar
5 750 5 bar
6 810 3 bar
7 1330 7 bar
8 1385 2 bar
The values in column 'A' when 'C' == 'foo'
should also be found when 'C' == 'bar'
(something like an index), although it can have missing data in both 'foo' and 'bar'. How can I mask (or assign 'NA') the rows of column 'B' if both 'foo' and 'bar' are lower than 5 or any of them is missing? In the example above the output would be something like:
A B C
0 188 2 foo
1 750 5 foo
2 1330 7 foo
3 1385 NA foo
4 188 5 bar
5 750 5 bar
6 810 NA bar
7 1330 7 bar
8 1385 NA bar
Upvotes: 4
Views: 4885
Reputation: 1925
Another possible solution using groupby and some other ideas borrowed from jpp's answer:
# create a mapping test for each group from column 'A'
fmap = df.groupby(['A']).apply(lambda x: all(x['B'] < 5))
# and generate a new masking map from that
mask_map = df['A'].map(fmap)
# then just mask the values in the original DF
df['B'] = df['B'].mask(mask_map)
A B C
0 188 2.0 foo
1 750 5.0 foo
2 1330 7.0 foo
3 1385 NaN foo
4 188 5.0 bar
5 750 5.0 bar
6 810 NaN bar
7 1330 7.0 bar
8 1385 NaN bar
Upvotes: 1
Reputation: 164843
Here's one solution. The idea is to construct two Boolean masks, m1
and m2
, from two mapping series, s1
and s2
. Then use pd.Series.mask
to mask series B
.
# create separate mappings for foo and bar
s1 = df.loc[df['C'] == 'foo'].set_index('A')['B']
s2 = df.loc[df['C'] == 'bar'].set_index('A')['B']
# use -np.inf to cover missing mappings
m1 = df['A'].map(s1).fillna(-np.inf).lt(5)
m2 = df['A'].map(s2).fillna(-np.inf).lt(5)
df['B'] = df['B'].mask(m1 & m2)
print(df)
A B C
0 188 2.0 foo
1 750 5.0 foo
2 1330 7.0 foo
3 1385 NaN foo
4 188 5.0 bar
5 750 5.0 bar
6 810 NaN bar
7 1330 7.0 bar
8 1385 NaN bar
Upvotes: 2