ele
ele

Reputation: 13

Pandas | Compare two CSV files and return matches

so basically im trying to compare two CSV files and return matches.

CSV1: Contains a list of Keywords.

keywords

Apple
Banana
Orange

CSV2: Contains random content.

content

I like Apples
Banana is my favorite Fruit
Strawberry Smoothies are the best

If I include the Keywords in the Code like this... I get a decent result.

import pandas as pd

df = pd.read_csv('CSV1.csv')
result = df[df.content.str.contains('Apple|Banana|Orange')]

Since the Keyword-file is getting bigger. I'm looking for a way to extract the Keywords from a csv and check for matches instead of putting all the Keywords in the Code.

Upvotes: 1

Views: 472

Answers (1)

You could do that by using the pandas isin() function (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.isin.html)

The pseudo code would be:

# put the csv1's list to lower case
list_csv1 = [i.lower() for i in list_csv1]

# use the isin() function
# again, put the content to lower case
result = df[df.content.str.lower().isin(list_csv1)]

Putting everything to lower cases isn't mandatory but is a good way to normalize your data and prevent missings.

Upvotes: 1

Related Questions