Aaron Lee
Aaron Lee

Reputation: 75

How to conditionally replace the first row of each group with the second row

I want to filter all qualified boss in the dataset. The rule is, each group has one boss, if the status of the boss is "Active" then its valid, if the statue of boss is "Inactive" then we need to replace it using the second row of that group.

For example, in row 1 group A, Allen is Boss but Inactive, then the second row of group A need to replace the first row. In group B, Kim is Boss and Active, so its valid.

Group Name Title Status
A Allen Boss Inactive
A Lisa n Active
A Louis n Active
B Kim Boss Active
B Derek n Active
B Andrew n Active
B Katie n Active
C Lux Boss Inactive

In the end, all inactive Boss row should be filtered out, end result should be like below.

Group Name Title Status
A Lisa n Active
B Kim Boss Active

Upvotes: 0

Views: 301

Answers (2)

BeRT2me
BeRT2me

Reputation: 13251

Drop the Inactive rows, and then take the first line from each group.

df[df.Status.ne('Inactive')].groupby('Group', as_index=False).first()

Output:

  Group  Name Title  Status
0     A  Lisa     n  Active
1     B   Kim  Boss  Active

Upvotes: 1

Michael S.
Michael S.

Reputation: 3128

How's something like this?

df[df.Status.eq("Active")].drop_duplicates(subset=["Group"], ignore_index=True)

Output:

    Group   Name    Title       Status
0   A       Lisa    n           Active
1   B       Kim     Boss        Active

Stepping through it:

  • df[df.Status.eq("Active")] grabs only the rows where "Status" is "Active"
  • drop_duplicates(subset=["Group"] drops all rows after the first occurrence of a new value in "Group" ... e.g returns the first row with group A, then the first row with Group B, etc
  • ignore_index=True) ignores the above rows' index and resets the index to start back at 0. Without this the index would be [1, 3]

Upvotes: 1

Related Questions