gussilago
gussilago

Reputation: 932

Masking a DataFrame using multiple criteria

I know one can mask out certain rows in a data frame using e.g.

(1) mask = df['A']=='a'

where df is the data frame at hand having a column named 'A'. Calling df[mask] yields my new "masked" DataFrame.

One can of course also use multiple criteria with

(2) mask = (df['A']=='a') | (df['A']=='b')

This last step however can get a bit tedious when there are several criteria that need to be fulfilled, e.g.

(3) mask = (df['A']=='a') | (df['A']=='b') | (df['A']=='c') | (df['A']=='d') | ...

Now, say I have my filtering criteria in an array as

(4) filter = ['a', 'b', 'c', 'd', ...]
    # ... here means a lot of other criteria

Is there a way to get the same result as in (3) above, using a one-liner?

Something like:

(5) mask = df.where(df['A']==filter)
    df_new = df[mask]

In this case (5) obviously returns an error.

Upvotes: 7

Views: 7622

Answers (1)

Alex Riley
Alex Riley

Reputation: 176810

I would use Series.isin():

filter = ['a', 'b', 'c', 'd']
df_new = df[df["A"].isin(filter)]

df_new is a DataFrame with rows in which the entry of df["A"] appears in filter.

Upvotes: 9

Related Questions