Reputation: 858
I want to slice (or filter for) those groups (Though, I'd settle for rows.) that belong to groups in d2.groupby('g')
where the max of the values of s
in the group, for rows where test
is 1
, is 2.
pd.__version__
# '0.24.1'
d2 = pd.DataFrame({'g': [1, 1, 1, 2, 2, 2],
'test': [1, 1, 0, 1, 0, 0],
's': [1, 2, 3, 1, 2, 3]})
d2
g test s
0 1 1 1
1 1 1 2
2 1 0 3
3 2 1 1
4 2 0 2
5 2 0 3
In plain English: I want to keep groups where the maximum value of s
that has a corresponding test
value of 1 is 2. For the example below, I want to keep the g
group 1
because the second row has test == 1
and s == 2
, and the third row has s == 0
. Expected output for this data:
g test s
0 1 1 1
1 1 1 2
2 1 0 3
I've tried d2.groupby('g').filter(lambda x: (x.test == 1)) # followed by nonsense
, as well as d2.groupby('g')[[d2.s == 1]] # with more nonsense
. The latter gives me the message about Series objects being mutable and unhashable. I've tried a number of other equally senseless and useless approaches. How do I groupby('g')
and filter by the max of s
where test
is 1
? This seems like it should be easy, but I can't get it. I could get there by adding a column, but that seems like a kludge.
Note: I have searched for the answer for this. I'd gladly take any search hints about how to find any current solutions for this question, if there are any. Thanks!
Upvotes: 3
Views: 57
Reputation: 402663
You can use groupby
and transform
to compute a mask, something like this:
df[df['s'].where(df['test'] == 1).groupby(df['g']).transform('max') == 2]
g test s
0 1 1 1
1 1 1 2
2 1 0 3
The idea behind Series.where
is to let us only consider values of "s" where "test" is 1.
Here's a similar version above inspired by WeNYoBen which will work because "test" is 0 or 1 here.
df[df['s'].mul(df['test']).groupby(df['g']).transform('max').eq(2)]
g test s
0 1 1 1
1 1 1 2
2 1 0 3
Upvotes: 2
Reputation: 323306
Using filter
d2.groupby('g').filter(lambda x : (x['s']*x['test']).max()==2)
Out[390]:
g s test
0 1 1 1
1 1 2 1
2 1 3 0
Another way using isin
since filter
usually slower in big data frame
s=d2.s.mul(d2.test).groupby(d2['g']).max()==2
d2.loc[d2.g.isin(s.index[s])]
Out[394]:
g s test
0 1 1 1
1 1 2 1
2 1 3 0
Upvotes: 2