ah bon
ah bon

Reputation: 10021

Check whether the two string columns contain each other in Python

Given a small dataset as follows:

   id       a       b
0   1     lol   lolec
1   2   rambo     ram
2   3      ki     pio
3   4    iloc     loc
4   5   strip  rstrip
5   6  lambda  lambda

I would like to create a new column c based on the following criterion:

If a is equal or substring of b or vise versa, then create a new column c with value 1, otherwise keep it as 0.

How could I do that in Pandas or Python?

The expected result:

   id       a       b  c
0   1     lol   lolec  1
1   2   rambo     ram  1
2   3      ki     pio  0
3   4    iloc     loc  1
4   5   strip  rstrip  1
5   6  lambda  lambda  1

To check whether a is in b or b is in a, we can use:

df.apply(lambda x: x.a in x.b, axis=1)
df.apply(lambda x: x.b in x.a, axis=1)

Upvotes: 0

Views: 69

Answers (1)

akuiper
akuiper

Reputation: 214957

Use zip and list comprehension:

df['c'] = [int(a in b or b in a) for a, b in zip(df.a, df.b)]

df
   id       a       b  c
0   1     lol   lolec  1
1   2   rambo     ram  1
2   3      ki     pio  0
3   4    iloc     loc  1
4   5   strip  rstrip  1
5   6  lambda  lambda  1

Or use apply, just combine both conditions with or:

df['c'] = df.apply(lambda r: int(r.a in r.b or r.b in r.a), axis=1)

Upvotes: 6

Related Questions