user3739312
user3739312

Reputation: 99

pandas select rows after groupby

In DataFrames,

 item_#, status, field1, field2 
   123,    "A",    "val1", "val2"
   223,    "B",    "val3", "val4"
   123,    "B",    "val5", "val6"
   323,    "A",    "val7", "val8"

what I want is the list of item_# that has both status "A" and status "B". something like df.groupby('item_#')[(df.status.isin(['A', 'B']), but this doesn't actually work. It gets me all the item that either one of the values in the list.

any suggestion would be appreciated!

Upvotes: 5

Views: 18383

Answers (3)

BENY
BENY

Reputation: 323226

You can using filter + isin

df.groupby('item_#').filter(lambda x : pd.Series(['A','B']).isin(x['status']).all())
Out[473]: 
   item_# status field1 field2
0     123      A   val1   val2
2     123      B   val5   val6

Update method

df[df.groupby('item_#').status.transform(lambda x : {'A','B'}.issubset(set(x)))]
   item_# status field1 field2
0       1      A   val1   val2
2       1      B   val5   val6

Upvotes: 6

Scott Boston
Scott Boston

Reputation: 153460

You can use set:

df[df.groupby('item_#')['status']
     .transform(lambda x: set(x.values.tolist()) == {'A','B'})]

Output:

item_# status field1 field2

0     123      A   val1   val2
2     123      B   val5   val6

Upvotes: 1

Allen Qin
Allen Qin

Reputation: 19947

You can also use unstack, dropna and stack:

df.groupby(['item_#','status']).first().unstack().dropna().stack()
Out[136]: 
                    field1   field2
item_# status                      
123        "A"      "val1"   "val2"
           "B"      "val5"   "val6"

Upvotes: 0

Related Questions