lydol
lydol

Reputation: 121

Checking if column in dataframe contains any item from list of strings

My goal is to check my dataframe column, and if that column contains items from a list of strings (matches in ex), then I want to create a new dataframe with all of those items that match.

With my current code I'm able to grab a list of the columns that match, however, It takes it as a list and I want to create a new dataframe with the previous information I had.

Here is my current code - Rather than resulting to a list I want the entire dataframe information I previously had

matches = ['beat saber', 'half life', 'walking dead', 'population one']
checking = []
for x in hot_quest1['all_text']:
    if any(z in x for z in matches):
        checking.append(x)

enter image description here

Upvotes: 3

Views: 3604

Answers (1)

Simon Crowe
Simon Crowe

Reputation: 468

Pandas generally allows you to filter data frames without resorting to for loops.

This is one approach that should work:

matches = ['beat saber', 'half life', 'walking dead', 'population one']

# matches_regex is a regular expression meaning any of your strings: 
# "beat saber|half life|walking dead|population one"
matches_regex = "|".join(matches)

# matches_bools will be a series of booleans indicating whether there was a match
# for each item in the series
matches_bools = hot_quest1.all_text.str.contains(matches_regex, regex=True)

# You can then use that series of booleans to derive a new data frame 
# containing only matching rows
matched_rows = hot_quest1[matches_bools]

Here's the documentation for the str.contains method. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.contains.html

Upvotes: 6

Related Questions