Reputation: 121
My goal is to check my dataframe column, and if that column contains items from a list of strings (matches in ex), then I want to create a new dataframe with all of those items that match.
With my current code I'm able to grab a list of the columns that match, however, It takes it as a list and I want to create a new dataframe with the previous information I had.
Here is my current code - Rather than resulting to a list I want the entire dataframe information I previously had
matches = ['beat saber', 'half life', 'walking dead', 'population one']
checking = []
for x in hot_quest1['all_text']:
if any(z in x for z in matches):
checking.append(x)
Upvotes: 3
Views: 3604
Reputation: 468
Pandas generally allows you to filter data frames without resorting to for
loops.
This is one approach that should work:
matches = ['beat saber', 'half life', 'walking dead', 'population one']
# matches_regex is a regular expression meaning any of your strings:
# "beat saber|half life|walking dead|population one"
matches_regex = "|".join(matches)
# matches_bools will be a series of booleans indicating whether there was a match
# for each item in the series
matches_bools = hot_quest1.all_text.str.contains(matches_regex, regex=True)
# You can then use that series of booleans to derive a new data frame
# containing only matching rows
matched_rows = hot_quest1[matches_bools]
Here's the documentation for the str.contains
method.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.contains.html
Upvotes: 6