Amrith Krishna
Amrith Krishna

Reputation: 2853

Use `isin(list1)` in pandas to identify values in a column that has all the items in list1

For a given pandas dataframe like the following,

    h1  h2  h3
    mn  a   1
    mn  b   1
    rs  b   1
    pq  a   1
    we  c   1

if I use the filtering with isin(), say df[df["h2"].isin(["a","b"])]["h1"].unique(), it would result in the following:

    h1
    mn
    rs
    pq

Instead of the behavior that matches with any element of the list, I need to find entries that matches all of the elements in the list, i.e. the desired output should be:

 h1
 mn

How exactly can this be achieved? The number of elements in the list inside isin() is arbitrary, and can be more than 2.

Upvotes: 1

Views: 495

Answers (2)

ansev
ansev

Reputation: 30920

Use groupby.filter with np.isin:

new_df = df.groupby('h1').filter(lambda x: np.isin(['a','b'],x['h2']).all())
print(new_df)
   h1 h2  h3
0  mn  a   1
1  mn  b   1

s = df.groupby('h1')['h2'].apply(lambda x: np.isin(['a','b'],x).all())
s.index[s]
#Index(['mn'], dtype='object', name='h1')

Upvotes: 3

jezrael
jezrael

Reputation: 862611

You can use issubset with set per groups for mask:

s = df.groupby('h1')['h2'].apply(lambda x: set(["a","b"]).issubset(x))
print (s)
h1
mn     True
pq    False
rs    False
we    False
Name: h2, dtype: bool

And then filter index values:

vals = s.index[s]
print (vals)
Index(['mn'], dtype='object', name='h1')

Upvotes: 4

Related Questions