Reputation: 126
I am trying to clean my dataframe and I am trying to use groupby function. I have ID
and event_type
as my columns. I want to get a new dataframe where if there is only one row having a Unique ID then the event_type
must be a
. If not then delete that row.
Data looks like this: The event_type
can be "a" or "b"
+-----+------------+
| ID | event_type |
+-----+------------+
| xyz | a |
| pqr | b |
| xyz | b |
| rst | a |
+-----+------------+
Output:
Since the ID
"pqr"
occurs only once (which is the count) and does not have a
(column value) as the event_type
the dataframe should convert to the following:
+-----+------------+
| ID | event_type |
+-----+------------+
| xyz | a |
| xyz | b |
| rst | a |
+-----+------------+
Upvotes: 0
Views: 39
Reputation: 13437
You can use your logic within a groupby
import pandas as pd
df = pd.DataFrame({"ID":['xyz', 'pqr', 'xyz', 'rst'],
"event_type":['a', 'b', 'b', 'a']})
what you are asking is this
df.groupby("ID")\
.apply(lambda x: not (len(x)==1 and
not "a" in x["event_type"].values))
as you can check by printing it. Finally to use this filter you just run
df = df.groupby("ID")\
.filter(lambda x: not (len(x)==1 and
not "a" in x["event_type"].values))\
.reset_index(drop=True)
Upvotes: 1