chip
chip

Reputation: 59

How to search a col in pandas data frame and make lists if a word exists in the DF col

I have a data frame with sentences as a col, what I want to do is create a function that will search all the sentences(each row of the sentences col) for the words in this list : search_words = ['cat', 'dog', 'pet']

Then it will make a new list with sentences that include each of the words. eg. list of sentences with cat[], list of sentences not with cat[] and so on for the other words in the search_words list.

Upvotes: 0

Views: 66

Answers (2)

Corralien
Corralien

Reputation: 120391

Use str.extractall to find row that match to your search_words:

# create a regex of words to search
pat = fr"\b({'|'.join(search_words)})\b"

out = df.join(df['sentences'].str.extractall(pat)
                             .droplevel(1).squeeze()
                             .rename('words'))

At this point, your output looks like:

>>> out
             sentences words
0               my cat   cat
1               my dog   dog
2               my pet   pet
3  your cat and my dog   cat
3  your cat and my dog   dog
4  your dog and my pet   dog
4  your dog and my pet   pet
5  your pet and my cat   pet
5  your pet and my cat   cat

>>> pat
'\\b(cat|dog|pet)\\b'

Now use pd.crosstab between the 2 columns:

out = pd.crosstab(out['sentences'], out['words']).astype(bool)

Output:

>>> out
words                  cat    dog    pet
sentences
my cat                True  False  False
my dog               False   True  False
my pet               False  False   True
your cat and my dog   True   True  False
your dog and my pet  False   True   True
your pet and my cat   True  False   True

Now you can create any list:

# match 'cat'
>>> out.loc[out['cat']].index.tolist()

# no match 'dog'
>>> out.loc[~out['dog']].index.tolist()
['my cat', 'my pet', 'your pet and my cat']

Upvotes: 1

Kevin Amiri
Kevin Amiri

Reputation: 1

import pandas as pd

df = pd.DataFrame({'sentences': ['I have a cat', 'I have a dog', 'I have a pet', 'I have a parrot']})

search_words = ['cat', 'dog', 'pet']

def search_sentences(df, search_words):
    for word in search_words:
        df[word] = df['sentences'].str.contains(word)
    return df

search_sentences(df, search_words)

Upvotes: 0

Related Questions