Reputation: 49
I have 2 dataframes across which I need to map the keywords. The input data(df1) looks like this:
keyword subtopic
post office Brand
uspshelp uspshelp Help
package delivery Shipping
fed ex Brand
ups fedex Brand
delivery done Shipping
united states location
rt ups retweet
This is the other dataframe (df2) which is to be used for keyword matching:
Key Media_type cleaned_text
910040 facebook will take post office
409535 twitter need help with upshelp upshelp
218658 facebook there no section post office alabama ups fedex
218658 facebook there no section post office alabama ups fedex
518903 twitter cant wait see exactly ups fedex truck package
2423281 twitter fed ex messed seedless
763587 twitter crazy package delivery rammed car
827572 twitter formatting idead delivery done
2404106 facebook supoused mexico united states america
1077739 twitter rt ups
I want to map the 'keyword' column in df1 to the 'cleaned_text' column in df2 based on few conditions:
This is how the output dataframe(df3) should look like:
Key Media_type cleaned_text keyword subtopic
910040 facebook will take post office post office Brand
409535 twitter need help with upshelp upshelp uspshelp uspshelp Help
218658 facebook there no section post office alabama ups fedex post office Brand
218658 facebook there no section post office alabama ups fedex ups fedex Brand
518903 twitter cant wait see exactly ups fedex truck package ups fedex Brand
2423281 twitter fed ex messed seedless fed ex messed Brand
763587 twitter crazy package delivery rammed car package delivery Shipping
827572 twitter formatting idead delivery done delivery done Shipping
2404106 facebook supoused mexico united states america united states america location
1077739 twitter rt ups rt ups retweet
Upvotes: 0
Views: 218
Reputation: 998
How about converting your df1 into a dictionary? And then loop through your df2 and search for matches. It is maybe not the most efficient way, but it is very readable
keyword_dict = {row.keyword: row.subtopic for row in df1.itertuples()}
df3_data = []
for row in df2.itertuples():
text = row.cleaned_text
for keyword in keyword_dict:
if keyword in text:
df3_row = [row.Key, row.Media_type, row.cleaned_text, keyword, keyword_dict[keyword]]
df3_data.append(df3_row)
df3_columns = list(df2.columns) + list(df1.columns)
df3 = pd.DataFrame(df3_data, columns=df3_columns)
Upvotes: 1