Anas Baheh
Anas Baheh

Reputation: 137

Check if pandas string column contains multiple words, in any order

I am working on Twitter data and trying to find strings that contain more than one word. The following line works for one word and with the OR condition.

tweets_text[tweets_text.str.contains("break")] #Find strings with the word break

tweets_text[tweets_text.str.contains("break|social|media")] #Find strings with either break or social, or media

I am trying to find the strings that have these three words ("break & social & media")

Upvotes: 2

Views: 3447

Answers (3)

Paschalis Ag
Paschalis Ag

Reputation: 138

You can always add some additional parameters to ignore uppercase or lowercase letters, using flags. Using @Rutger 's code. Check the documentation for some additional parameters.

tweets_text.loc[tweets_text.str.contains("break", flags = re.IGNORECASE) & tweets_text.str.contains("social") & tweets_text.str.contains("media", flags = re.IGNORECASE)]

In addition to that you can do the same things by combining lambda function and all, as follows:

def find_words(data, list_of_words):
    function = lambda row: all(word.lower() in row.lower() 
                               for word in list_of_words)

    return data.loc[data[column_name].apply(function)]

Upvotes: 0

MAFiA303
MAFiA303

Reputation: 1317

df = pd.Series(['break', 'break media social', 'break media'])

Series:

0                 break
1    break media social
2           break media

extraciton:

tweets_text[tweets_text.str.contains('(?=.*break)(?=.*social)(?=.*media)')]

output:

1    break media social

Upvotes: 3

Rutger
Rutger

Reputation: 603

You can split them up like this:

tweets_text.loc[tweets_text.str.contains("break") & tweets_text.str.contains("social") & tweets_text.str.contains("media")]

Upvotes: 1

Related Questions