Reputation: 2853
For a given pandas dataframe like the following,
h1 h2 h3
mn a 1
mn b 1
rs b 1
pq a 1
we c 1
if I use the filtering with isin()
, say df[df["h2"].isin(["a","b"])]["h1"].unique()
, it would result in the following:
h1
mn
rs
pq
Instead of the behavior that matches with any element of the list, I need to find entries that matches all of the elements in the list, i.e. the desired output should be:
h1
mn
How exactly can this be achieved? The number of elements in the list inside isin()
is arbitrary, and can be more than 2.
Upvotes: 1
Views: 495
Reputation: 30920
Use groupby.filter
with np.isin
:
new_df = df.groupby('h1').filter(lambda x: np.isin(['a','b'],x['h2']).all())
print(new_df)
h1 h2 h3
0 mn a 1
1 mn b 1
s = df.groupby('h1')['h2'].apply(lambda x: np.isin(['a','b'],x).all())
s.index[s]
#Index(['mn'], dtype='object', name='h1')
Upvotes: 3
Reputation: 862611
You can use issubset
with set
per groups for mask:
s = df.groupby('h1')['h2'].apply(lambda x: set(["a","b"]).issubset(x))
print (s)
h1
mn True
pq False
rs False
we False
Name: h2, dtype: bool
And then filter index values:
vals = s.index[s]
print (vals)
Index(['mn'], dtype='object', name='h1')
Upvotes: 4