Reputation: 13
Suppose that we have a Pandas DataFrame (df) as:
nickname | event | Date |
---|---|---|
A | 0 | 2020-01-02 |
A | 1 | 2020-01-03 |
B | 0 | 2020-01-02 |
B | 0 | 2020-01-03 |
C | 1 | 2020-01-02 |
C | 1 | 2020-01-03 |
Is there a quick and elegant way of getting only the lines where the nickname has never had an 1 event? Like
nickname | event | Date |
---|---|---|
B | 0 | 2020-01-02 |
B | 0 | 2020-01-03 |
I came up with the following code to solve this problem
df_group= df.groupby('nickname')['event'].max()
df_never = df_group[df_group == 0]
df_only_never = df[df['nickname'].isin(df_never .index)]
I have the feeling that there is a one-line way of doing this with Pandas, but I'm not sure how to do it.
Upvotes: 1
Views: 90
Reputation: 23217
Your codes are close. Instead of using .max()
which aggregate and reduce the number of rows, you can use .transform()
on 'max'
to make all group values equal to the maximum value (either 0 or 1 for whole group).
Then, like you did, check these values for equality with 0
to get a boolean index.
Finally, use .loc
on the boolean index to locate the rows of group(s) with all zeros, as follows:
df_out = df.loc[df.groupby('nickname')['event'].transform('max') == 0]
Result:
print(df_out)
nickname event Date
2 B 0 2020-01-02
3 B 0 2020-01-03
Upvotes: 1
Reputation: 195438
You can use .groupby
+ .filter
:
x = df.groupby("nickname").filter(lambda x: x["event"].eq(0).all())
print(x)
Prints:
nickname event Date
2 B 0 2020-01-02
3 B 0 2020-01-03
Upvotes: 2