Fastest way to find a word in a pandas dataframe column

Question

I have a dataframe like this:

name	sentence
Tom	The cat is on the table.
Bob	One might say that caterpillars are majestic

I want to get as a result a dataframe like this:

name	sentence	contains_cat
Tom	The cat is on the table.	True
Bob	One might say that caterpillars are majestic	False

So the column "contains_cat" has to show True only if the corresponding row of column "sentence" contains exactly the word cat (not caterpillar, for example).

I wrote a code that does this, searching for words like " cat " or " cat." . Is it possible to speed this up, considering that I'd like to do this for large dataframes and for many words?

import pandas as pd

df = pd.DataFrame({'name': ['Tom', 'Bob'],
              'sentence': ['The cat is on the table.', 'One might say that caterpillars are majestic']})
df['contains_cat'] = False

string_to_find = [' cat ',
                  'Cat ',
                  ' cat.']
for ii in range(0,len(string_to_find)):
    df1 = pd.DataFrame({'dummy': [string_to_find[ii]] * len(df)})
    df['contains_cat'] = df['contains_cat'] | \
                         [x[0] in x[1] for x in zip(df1['dummy'], df['sentence'])]

print(df)

Fastest way to find a word in a pandas dataframe column

Answers (1)

Related Questions