Reputation: 313
I have rules dataframe which has several columns:
col1 ... colx s1 s2
'a' 'g' '123' '123 school'
'b' 'g' '456' '123 school'
'a' 'r' '123' '456 school'
'd' 'g' '456' '456 school'
'a' 'g' '123' '123 school'
I need to filter out all rows where 's1'
is not a substring from 's2'
. The result would be:
col1 ... colx s1 s2
'a' 'g' '123' '123 school'
'd' 'g' '456' '456 school'
'a' 'g' '123' '123 school'
I need to do that in the fast way possible so I tried:
rules = rules[rules['s1'] in rules['s2']]
but it does not seem to work
Upvotes: 1
Views: 562
Reputation: 294278
I'd use a comprehension and a boolean mask
df[[x in y for x, y in zip(df.s1, df.s2)]]
col1 colx s1 s2
0 a g 123 123 school
3 d g 456 456 school
4 a g 123 123 school
You can also use operator.contains
and map
from operator import contains
df[[*map(contains, df.s2, df.s1)]]
col1 colx s1 s2
0 a g 123 123 school
3 d g 456 456 school
4 a g 123 123 school
Upvotes: 5
Reputation: 2190
# compare columns row wise
boolean_selector = df.apply(lambda x: x['s1'] in x['s2'], axis=1)
# copy of the data-frame
new_df = df[boolen_selector]
# view of the data-frame
true_group = df.loc[boolean_selector]
As a one-liner;
df = df[df.apply(lambda x: x['s1'] in x['s2'], axis=1)]
Upvotes: 1