Partial word match between two columns of different pandas dataframes

Question

I have two data-frames like :

df1 :

df2 :

I am trying make a match of any term to text.

MyCode :

import sys,os
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import csv
import re

# data
data1 = {'termID': [1,55,341,41,5685], 'term':['Cardic Arrest','Headache','Chest Pain','Muscle Pain', 'Knee Pain']}
data2 = {'textID': [25,12,52,35], 'text':['Hello Mike, Good Morning!!',
                                         'Oops!! My Knee pains!!',
                                          'Stop Music!! my head pains',
                                          'Arrest Innocent!!'
                                         ]}

#Dataframes 
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

# Matching logic
matchList=[]
for index_b, row_b in df2.iterrows():
    for index_a, row_a in df1.iterrows():
        if  row_a.term.lower() in row_b.text.lower() :   
            #print(row_b.text, row_a.term)
            matchList.append([row_b.textID,row_b.text ,row_a.term, row_a.termID] )

cols = ['textID', 'text,','term ','termID' ]
d = pd.DataFrame(matchList, columns = cols)
print(d)

Which gave me only single row as output :

I have two issues to fix:

Not sure how can I get output for any partial match like this :

Both DF1 and DF2 are of size of around 0.4M and 13M records.

What optimum ways are there to fix these two issues?

Partial word match between two columns of different pandas dataframes

Answers (1)

Related Questions