Filtering column results based on list values

Question

I want to filter out some rows in a dataframe where the Category Row includes one of the items from my list.

It works with exact matches, but not if there is more than one category associated with a record.

For example:

ID	Categories	Date	Users
1	A	01.01.2020	alfa
2	B	01.01.2020	beta
3	C	01.01.2020	nick
4	A, C	01.01.2020	ray
5	A, B	01.01.2020	john
6	B, C	01.01.2020	smith

So I try to use the following code:

exclude_list = ['A', 'C']
df = df[~df['Categories'].isin(exclude_list)]

this gives me

ID	Categories	Date	Users
2	B	01.01.2020	beta
4	A, C	01.01.2020	ray
5	A, B	01.01.2020	john
6	B, C	01.01.2020	smith

The 'Categories' where A and C are both present is still showing in the DF. I want to only show those Categories where A and C are not present.

I also tried to add variations such as ',A' 'A' 'A,' - but no luck.

jezrael · Accepted Answer

Use Series.str.contains for test by substrings if possible:

exclude_list = ['A', 'C']
df1 = df[~df['Categories'].str.contains('|'.join(exclude_list))]

Another idea is explode values splitted by , and test with isin with any for match at least one value per index:

exclude_list = ['A', 'C']
df1 = df[~df['Categories'].str.split(', ').explode().isin(exclude_list).any(level=0)]

Last possible solution, with apply, so maybe slowier in real data:

df1 = df[~df['Categories'].apply(lambda x: any([y in exclude_list for y in x.split(', ')]))]

print (df1)
   ID Categories        Date Users
1   2          B  01.01.2020  beta

Answers (1)