Reputation: 75
I want to filter all qualified boss in the dataset. The rule is, each group has one boss, if the status of the boss is "Active" then its valid, if the statue of boss is "Inactive" then we need to replace it using the second row of that group.
For example, in row 1 group A, Allen is Boss but Inactive, then the second row of group A need to replace the first row. In group B, Kim is Boss and Active, so its valid.
Group | Name | Title | Status |
---|---|---|---|
A | Allen | Boss | Inactive |
A | Lisa | n | Active |
A | Louis | n | Active |
B | Kim | Boss | Active |
B | Derek | n | Active |
B | Andrew | n | Active |
B | Katie | n | Active |
C | Lux | Boss | Inactive |
In the end, all inactive Boss row should be filtered out, end result should be like below.
Group | Name | Title | Status |
---|---|---|---|
A | Lisa | n | Active |
B | Kim | Boss | Active |
Upvotes: 0
Views: 301
Reputation: 13251
Drop the Inactive
rows, and then take the first line from each group.
df[df.Status.ne('Inactive')].groupby('Group', as_index=False).first()
Output:
Group Name Title Status
0 A Lisa n Active
1 B Kim Boss Active
Upvotes: 1
Reputation: 3128
How's something like this?
df[df.Status.eq("Active")].drop_duplicates(subset=["Group"], ignore_index=True)
Output:
Group Name Title Status
0 A Lisa n Active
1 B Kim Boss Active
Stepping through it:
df[df.Status.eq("Active")]
grabs only the rows where "Status" is "Active"drop_duplicates(subset=["Group"]
drops all rows after the first occurrence of a new value in "Group" ... e.g returns the first row with group A, then the first row with Group B, etcignore_index=True)
ignores the above rows' index and resets the index to start back at 0. Without this the index would be [1, 3]
Upvotes: 1