Reputation: 59
I have a data frame with sentences as a col, what I want to do is create a function that will search all the sentences(each row of the sentences col) for the words in this list :
search_words = ['cat', 'dog', 'pet']
Then it will make a new list with sentences that include each of the words. eg. list of sentences with cat[], list of sentences not with cat[] and so on for the other words in the search_words list.
Upvotes: 0
Views: 66
Reputation: 120391
Use str.extractall
to find row that match to your search_words
:
# create a regex of words to search
pat = fr"\b({'|'.join(search_words)})\b"
out = df.join(df['sentences'].str.extractall(pat)
.droplevel(1).squeeze()
.rename('words'))
At this point, your output looks like:
>>> out
sentences words
0 my cat cat
1 my dog dog
2 my pet pet
3 your cat and my dog cat
3 your cat and my dog dog
4 your dog and my pet dog
4 your dog and my pet pet
5 your pet and my cat pet
5 your pet and my cat cat
>>> pat
'\\b(cat|dog|pet)\\b'
Now use pd.crosstab
between the 2 columns:
out = pd.crosstab(out['sentences'], out['words']).astype(bool)
Output:
>>> out
words cat dog pet
sentences
my cat True False False
my dog False True False
my pet False False True
your cat and my dog True True False
your dog and my pet False True True
your pet and my cat True False True
Now you can create any list:
# match 'cat'
>>> out.loc[out['cat']].index.tolist()
# no match 'dog'
>>> out.loc[~out['dog']].index.tolist()
['my cat', 'my pet', 'your pet and my cat']
Upvotes: 1
Reputation: 1
import pandas as pd
df = pd.DataFrame({'sentences': ['I have a cat', 'I have a dog', 'I have a pet', 'I have a parrot']})
search_words = ['cat', 'dog', 'pet']
def search_sentences(df, search_words):
for word in search_words:
df[word] = df['sentences'].str.contains(word)
return df
search_sentences(df, search_words)
Upvotes: 0