Luke
Luke

Reputation: 7089

Pandas efficient check if column contains string in other column

I'm trying to get a boolean index of whether one column contains a string from the same row in another column:

a      b
boop   beep bop
zorp   zorpfoo
zip    foo zip fa

In check to see if column b contains a string, I'd like to get:

[False, True, True]

Right now I'm trying this approach, but it is slow:

df.apply(lambda row: row['a'] in row['b'], axis=1)

Is there a .str method for this?

Upvotes: 7

Views: 1949

Answers (1)

xmduhan
xmduhan

Reputation: 1025

df.apply(..., axis=1) is is very slow! you should avoid to use it!

from random import sample
from string import lowercase
from pandas import DataFrame

df = DataFrame({
    'a': map(lambda x: ''.join(sample(lowercase, 2)), range(100000)),
    'b': map(lambda x: ''.join(sample(lowercase, 5)), range(100000))
})

%time map(lambda (x, y): x in y, zip(df['a'], df['b']))

%time df.apply(lambda x: x[0] in x[1], axis=1)

Upvotes: 3

Related Questions