Reputation: 63
I have 2 dataframes that I would like to merge on a particular column based on a string contains. It seems like the following question, but in a different order: How to merge pandas on string contains?
import pandas as pd
df1 = pd.DataFrame({'Amount':[10, 20, 30], 'Description':['this is a text','this is another text','this is an important']})
df2 = pd.DataFrame({'Text':['another','important'], 'Category':['Another Category','Important Category']})
rhs = (df1.Description
.apply(lambda x: df2[df2['Category']] if df2[df2['Text']] in str(x).lower() else None)
)
(pd.concat([df1.Amount, rhs], axis=1, ignore_index=True)
.rename(columns={0: 'Amount', 1: 'Category'}))
I got the following error message:
KeyError: "None of [Index(['another', 'important'], dtype='object')] are in the [columns]"
This occurs because of the lambda expression. With the df2[df2['Text']] part I try to iterate through the dataframe containing the categories, but this doesn't work.
Upvotes: 6
Views: 1589
Reputation: 46
Assuming that df2 is a unique table of texts and their categories, I suppose this could work. (assuming the dfs are as you've posted)
join_map = {row['Text']:row['Category'] for ind,row in df2.iterrows()}
df1['Category'] = df1['Description'].apply(lambda x: [val for key,val in join_map.items() if key in x][0] if [val for key,val in join_map.items() if key in x] else None)
Upvotes: 2