Filter python Dataframe by comparing string on column values

I have rules dataframe which has several columns:

col1 ... colx   s1       s2  
'a'      'g'    '123'   '123 school'
'b'      'g'    '456'   '123 school'
'a'      'r'    '123'   '456 school'
'd'      'g'    '456'   '456 school'
'a'      'g'    '123'   '123 school'

I need to filter out all rows where 's1' is not a substring from 's2'. The result would be:

col1 ... colx   s1       s2  
'a'      'g'    '123'   '123 school'
'd'      'g'    '456'   '456 school'
'a'      'g'    '123'   '123 school'

I need to do that in the fast way possible so I tried:

rules = rules[rules['s1'] in rules['s2']]

but it does not seem to work

Upvotes: 1

Views: 562

Answers (2)

piRSquared
piRSquared

Reputation: 294278

I'd use a comprehension and a boolean mask

df[[x in y for x, y in zip(df.s1, df.s2)]]

  col1 colx   s1          s2
0    a    g  123  123 school
3    d    g  456  456 school
4    a    g  123  123 school

You can also use operator.contains and map

from operator import contains

df[[*map(contains, df.s2, df.s1)]]

  col1 colx   s1          s2
0    a    g  123  123 school
3    d    g  456  456 school
4    a    g  123  123 school

Upvotes: 5

Sy Ker
Sy Ker

Reputation: 2190

# compare columns row wise
boolean_selector = df.apply(lambda x: x['s1'] in x['s2'], axis=1)

# copy of the data-frame 
new_df = df[boolen_selector]

# view of the data-frame 
true_group = df.loc[boolean_selector]

As a one-liner;

df = df[df.apply(lambda x: x['s1'] in x['s2'], axis=1)]

Upvotes: 1

Related Questions