Filter python Dataframe by comparing string on column values

Question

I have rules dataframe which has several columns:

col1 ... colx   s1       s2  
'a'      'g'    '123'   '123 school'
'b'      'g'    '456'   '123 school'
'a'      'r'    '123'   '456 school'
'd'      'g'    '456'   '456 school'
'a'      'g'    '123'   '123 school'

I need to filter out all rows where 's1' is not a substring from 's2'. The result would be:

col1 ... colx   s1       s2  
'a'      'g'    '123'   '123 school'
'd'      'g'    '456'   '456 school'
'a'      'g'    '123'   '123 school'

I need to do that in the fast way possible so I tried:

rules = rules[rules['s1'] in rules['s2']]

but it does not seem to work

piRSquared · Accepted Answer

I'd use a comprehension and a boolean mask

df[[x in y for x, y in zip(df.s1, df.s2)]]

  col1 colx   s1          s2
0    a    g  123  123 school
3    d    g  456  456 school
4    a    g  123  123 school

You can also use operator.contains and map

from operator import contains

df[[*map(contains, df.s2, df.s1)]]

  col1 colx   s1          s2
0    a    g  123  123 school
3    d    g  456  456 school
4    a    g  123  123 school

Filter python Dataframe by comparing string on column values

Answers (2)

Related Questions