Reputation: 13
so basically im trying to compare two CSV files and return matches.
CSV1: Contains a list of Keywords.
keywords
Apple
Banana
Orange
CSV2: Contains random content.
content
I like Apples
Banana is my favorite Fruit
Strawberry Smoothies are the best
If I include the Keywords in the Code like this... I get a decent result.
import pandas as pd
df = pd.read_csv('CSV1.csv')
result = df[df.content.str.contains('Apple|Banana|Orange')]
Since the Keyword-file is getting bigger. I'm looking for a way to extract the Keywords from a csv and check for matches instead of putting all the Keywords in the Code.
Upvotes: 1
Views: 472
Reputation: 452
You could do that by using the pandas isin() function (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.isin.html)
The pseudo code would be:
# put the csv1's list to lower case
list_csv1 = [i.lower() for i in list_csv1]
# use the isin() function
# again, put the content to lower case
result = df[df.content.str.lower().isin(list_csv1)]
Putting everything to lower cases isn't mandatory but is a good way to normalize your data and prevent missings.
Upvotes: 1