Reputation: 1446
I have a Pandas dataframe:
id attr
1 val1
2 val1||val2
3 val1||val3
4 val3
and a list special_val = ['val1', 'val2', 'val4']
I want to filter the first dataframe to keep rows whose ALL attr
values are in the list. So I need the results to be like this:
id attr
1 val1 #val1 is in special_val
2 val1||val2 #both val1 and val2 are in special_val
I am thinking of using pandas.DataFrame.isin
or pandas.Series.isin
but I can't come up with the correct syntax. Could you help?
Upvotes: 0
Views: 94
Reputation: 2137
You can try the following.
df['match'] = df['attr'].apply(lambda x: True if set(x.split('||')).intersection(set(special_val)) else False)
df[df['match'] == True]
Output
id attr
0 1 val1
1 2 val1||val2
Upvotes: 1
Reputation: 13407
You can do:
import numpy as np
special_val = set(['val1', 'val2', 'val4'])
df["attr2"]=df["attr"].str.split("\|\|").map(set)
df=df.loc[df["attr2"].eq(np.bitwise_and(df["attr2"], special_val))].drop(columns="attr2")
Outputs:
id attr
0 1 val1
1 2 val1||val2
Upvotes: 0
Reputation: 150815
You can combine str.split
, isin()
, and groupby()
:
s = df['attr'].str.split('\|+', expand=True).stack().isin(special_val).groupby(level=0).all()
df[s]
Output:
id attr
0 1 val1
1 2 val1||val2
Upvotes: 2