Reputation: 99
In DataFrames,
item_#, status, field1, field2
123, "A", "val1", "val2"
223, "B", "val3", "val4"
123, "B", "val5", "val6"
323, "A", "val7", "val8"
what I want is the list of item_#
that has both status "A"
and status "B"
.
something like df.groupby('item_#')[(df.status.isin(['A', 'B']
), but this doesn't actually work. It gets me all the item that either one of the values in the list.
any suggestion would be appreciated!
Upvotes: 5
Views: 18383
Reputation: 323226
You can using filter
+ isin
df.groupby('item_#').filter(lambda x : pd.Series(['A','B']).isin(x['status']).all())
Out[473]:
item_# status field1 field2
0 123 A val1 val2
2 123 B val5 val6
Update method
df[df.groupby('item_#').status.transform(lambda x : {'A','B'}.issubset(set(x)))]
item_# status field1 field2
0 1 A val1 val2
2 1 B val5 val6
Upvotes: 6
Reputation: 153460
You can use set
:
df[df.groupby('item_#')['status']
.transform(lambda x: set(x.values.tolist()) == {'A','B'})]
Output:
item_# status field1 field2
0 123 A val1 val2
2 123 B val5 val6
Upvotes: 1
Reputation: 19947
You can also use unstack, dropna and stack:
df.groupby(['item_#','status']).first().unstack().dropna().stack()
Out[136]:
field1 field2
item_# status
123 "A" "val1" "val2"
"B" "val5" "val6"
Upvotes: 0