Eduardo Pacheco
Eduardo Pacheco

Reputation: 13

Pandas way to check if an event has already happend python

Suppose that we have a Pandas DataFrame (df) as:

nickname event Date
A 0 2020-01-02
A 1 2020-01-03
B 0 2020-01-02
B 0 2020-01-03
C 1 2020-01-02
C 1 2020-01-03

Is there a quick and elegant way of getting only the lines where the nickname has never had an 1 event? Like

nickname event Date
B 0 2020-01-02
B 0 2020-01-03

I came up with the following code to solve this problem

df_group= df.groupby('nickname')['event'].max()
df_never = df_group[df_group == 0]
df_only_never = df[df['nickname'].isin(df_never .index)]

I have the feeling that there is a one-line way of doing this with Pandas, but I'm not sure how to do it.

Upvotes: 1

Views: 90

Answers (2)

SeaBean
SeaBean

Reputation: 23217

Your codes are close. Instead of using .max() which aggregate and reduce the number of rows, you can use .transform() on 'max' to make all group values equal to the maximum value (either 0 or 1 for whole group).

Then, like you did, check these values for equality with 0 to get a boolean index.

Finally, use .loc on the boolean index to locate the rows of group(s) with all zeros, as follows:

df_out = df.loc[df.groupby('nickname')['event'].transform('max') == 0]

Result:

print(df_out)

  nickname  event        Date
2        B      0  2020-01-02
3        B      0  2020-01-03

Upvotes: 1

Andrej Kesely
Andrej Kesely

Reputation: 195438

You can use .groupby + .filter:

x = df.groupby("nickname").filter(lambda x: x["event"].eq(0).all())
print(x)

Prints:

  nickname  event        Date
2        B      0  2020-01-02
3        B      0  2020-01-03

Upvotes: 2

Related Questions