ViRusi92
ViRusi92

Reputation: 25

Filtering column results based on list values

I want to filter out some rows in a dataframe where the Category Row includes one of the items from my list.

It works with exact matches, but not if there is more than one category associated with a record.

For example:

ID Categories Date Users
1 A 01.01.2020 alfa
2 B 01.01.2020 beta
3 C 01.01.2020 nick
4 A, C 01.01.2020 ray
5 A, B 01.01.2020 john
6 B, C 01.01.2020 smith

So I try to use the following code:

exclude_list = ['A', 'C']
df = df[~df['Categories'].isin(exclude_list)]

this gives me

ID Categories Date Users
2 B 01.01.2020 beta
4 A, C 01.01.2020 ray
5 A, B 01.01.2020 john
6 B, C 01.01.2020 smith

The 'Categories' where A and C are both present is still showing in the DF. I want to only show those Categories where A and C are not present.

I also tried to add variations such as ',A' 'A' 'A,' - but no luck.

Upvotes: 1

Views: 291

Answers (1)

jezrael
jezrael

Reputation: 862511

Use Series.str.contains for test by substrings if possible:

exclude_list = ['A', 'C']
df1 = df[~df['Categories'].str.contains('|'.join(exclude_list))]

Another idea is explode values splitted by , and test with isin with any for match at least one value per index:

exclude_list = ['A', 'C']
df1 = df[~df['Categories'].str.split(', ').explode().isin(exclude_list).any(level=0)]

Last possible solution, with apply, so maybe slowier in real data:

df1 = df[~df['Categories'].apply(lambda x: any([y in exclude_list for y in x.split(', ')]))]

print (df1)
   ID Categories        Date Users
1   2          B  01.01.2020  beta

Upvotes: 3

Related Questions