Using groupby in pandas to filter a dataframe using count and column value

Question

I am trying to clean my dataframe and I am trying to use groupby function. I have ID and event_type as my columns. I want to get a new dataframe where if there is only one row having a Unique ID then the event_type must be a. If not then delete that row.

Data looks like this: The event_type can be "a" or "b"

+-----+------------+
| ID  | event_type |
+-----+------------+
| xyz | a          |
| pqr | b          |
| xyz | b          |
| rst | a          |
+-----+------------+

Output: Since the ID "pqr" occurs only once (which is the count) and does not have a (column value) as the event_type the dataframe should convert to the following:

+-----+------------+
| ID  | event_type |
+-----+------------+
| xyz | a          |
| xyz | b          |
| rst | a          |
+-----+------------+

rpanai · Accepted Answer

You can use your logic within a groupby

import pandas as pd
df = pd.DataFrame({"ID":['xyz', 'pqr', 'xyz', 'rst'],
                   "event_type":['a', 'b', 'b', 'a']})

what you are asking is this

    df.groupby("ID")\
      .apply(lambda x:  not (len(x)==1 and
                             not "a" in x["event_type"].values))

as you can check by printing it. Finally to use this filter you just run

df = df.groupby("ID")\
       .filter(lambda x:  not (len(x)==1 and
                               not "a" in x["event_type"].values))\
       .reset_index(drop=True)

Using groupby in pandas to filter a dataframe using count and column value

Answers (1)

Related Questions