null92
null92

Reputation: 145

Multiconditional .isin() filter in pandas

Is it possible to filter a dataframe using multiples .isin() together?

df1
co | no | ds
A1 | yes| text1
A2 | no | text1
A3 | no | text3
df2
co | no | ds
A1 | yes| text1
A5 | no | text5
A6 | yes| text6

Filter if the entire row is in the other dataframe
df1 = df1[df1['co'].isin(df2['co'])] & df1[df1['no'].isin(df2['no'])] & df1[df1['ds'].isin(df2['ds'])]

df1
co | no | ds
A1 | yes| text1

Upvotes: 2

Views: 138

Answers (3)

mozway
mozway

Reputation: 262284

You can use an inner merge:

out = df1.merge(df2)

# or to limit to co/no/ds
cols = ['co', 'no', 'ds']
out = df1.merge(df2[cols])

Or, if you have potentially duplicates in df2, first drop_duplicates:

out = df1.merge(df2.drop_duplicates())

If you really want to use boolean indexing, then you can take advantage of numpy.logical_and.reduce to chain multiple conditions without needing to type all masks:

import numpy as np

cols = ['co', 'no', 'ds']
out = df1[np.logical_and.reduce([df1[c].isin(df2[c]) for c in cols])]

Upvotes: 1

chrslg
chrslg

Reputation: 13511

Boolean arithmetics (&) on series are valid only on boolean series. Like what you give for each filtering.

So you just need to apply this & to the condition. Not to the indexation you get from these booleans series.

In other words

df1[df1['co'].isin(df2['co']) & df1['no'].isin(df2['no']) & df1['ds'].isin(df2['ds'])]

Almost the same line as your non-working one. Except that I apply & on the condition. That is, instead of trying to df1[A] & df1[B] & df1[C] (which doesn't mean much. You could try to intersect those two dataframes df1[A], df1[B] and df1[C]; that costs a bit; but not to apply a &between them) I dodf1[A & B & C]`.

(Note on mozway's solution: that merge, indeed, is an elegant way to intersect the dataframes. But that is almost always slower than computing the boolean series index. Plus, in your case, your logic is already the one you need to compute such indexation. Timeit shows that merge solutions takes 3.59 ms per run, while this one takes 0.83 ms per run)

Upvotes: 0

Surjit Samra
Surjit Samra

Reputation: 4662

yes possible see the changes to your code

df1 = df1[(df1['co'].isin(df2['co']) & df1['no'].isin(df2['no']) &  df1['ds'].isin(df2['ds']))]
df1
Out[87]: 
   co   no     ds
0  A1  yes  text1

Upvotes: 0

Related Questions