Reputation: 35
As stated in title, I have a dataframe (let's call this df1) that is similar to this:
Col A | Desc |
---|---|
00001 | Dog |
00002 | dogs |
00003 | cat |
00004 | cats |
00005 | hooman |
I have a list of keywords I want to search, in a second dataframe, df2:
Keyword |
---|
dog |
cats |
bird |
How do I identify all records in df1 that has at least one keyword match from df2, and the final outcome is a dataframe (new or add to df1) that lists all the columns in df1 + the matched keyword? On top of that... ideally case insensitive, and the keyword list entry "dog" would help me also find "dogs" from df1?
Sample Expected Output:
Col A | Desc | Matched Keyword |
---|---|---|
00001 | Dog | dog |
00002 | dogs | dog |
00003 | cat | |
00004 | cats | cats |
00005 | hooman |
I've searched for some time in this site, here are a few other ones I have tried to follow but none of them actually worked. I always get nothing matched.
search dataframe for a keyword in any column and get the rows value matching between two DataFrames using pandas in python searching if anyone of word is present in the another column of a dataframe or in another data frame using python How to search for a keyword in different pandas dataframe and update or create a new column with matching keyword in parent DF
Any help would be great, thanks!
Upvotes: 1
Views: 237
Reputation: 11
import pandas as pd
from typing import List
df1 = pd.DataFrame({'col1': ["0001","0002","0003","0004","0005"], 'values':["dogs","cat","Dog","cats","hooman"]})
df2 = pd.DataFrame({"Keywords": ['dog','cat','bird']})
def find_string_in_substring(value:str, list_of_strings: List[str]):
for sub_value in list_of_strings:
if value.lower() in sub_value.lower() or sub_value.lower() in value.lower():
return sub_value
return False
df1["keyword_from_df2"] = df1["values"].apply(lambda x : find_string_in_substring(x,df2['Keywords'].tolist()))
df1
The logic is pretty straight forward, hope it is good enough, if not I will try to help better!
Upvotes: 0