Anakin Skywalker
Anakin Skywalker

Reputation: 2520

Filtering out rows with strings in a dataframe, which do not contain certain words, using Python

I have a dataframe with a column 'text".

I want to filter out everything else but rows in a text column, containing certain strings. And my list of words is long. For example, crime, taxation, etc.

This works for one word:

data_cleaned = data_cleaned.loc[data_cleaned['text'].str.contains('population')].reset_index(drop = True)

How to add multiple words, having not only population, but crime etc.

I see answers like this, but it does not work for me.

UPD.

My full list of words looks like this

key_words = ['population'
                          'migrarion'
                          'crime',
                          'safety',
                          'taxation',
                          'taxes',
                          'weather', 
                          'climate',
                          'opportunities',
                          'employment',
                          'unemployment',
                          'cultural life',
                          'services',
                          'jobs',
                          'economic growth',
                          'economic decline',
                          'pollution',
                          'environment',
                          'health',
                          'insurance',
                          'education',
                          'natural disaster',
                          'retirement']

Upvotes: 0

Views: 83

Answers (1)

bb1
bb1

Reputation: 7873

Assuming that lst is the list of strings the following would work:

def selector(s):
    for w in lst:
        if w in s:
            return True
    return False

data_cleaned = data_cleaned.loc[data_cleaned['text'].apply(selector)]

Upvotes: 1

Related Questions