Find and Replace in DataFrame using Pandas in optimized way

Question

I am trying to find and replace words from the 20K comments. Find and replace words are stored in dataframe and its around more than 20000. Comments in different dataframe and its around 20K.

Below is the example

import pandas as pd

df1 = pd.DataFrame({'Data' : ["Hull Damage happened and its insured by maritime hull insurence company","Non Cash Entry and claims are blocked"]})

df2 = pd.DataFrame({ 'Find' : ["Insurence","Non cash entry"],
                    'Replace' : ["Insurance","Blocked"],
                       })

And I am expecting the output below

op = ["Hull Damage happened and its insured by maritime hull insurance company","Blocked and claims are blocked"]})

Please help.

I am using loop but its taking more than 20 mins to do this. 20 k records in the data, 30000 words to be replaced

""KeywordSynonym"" -- Dataframe holds find and replace data in sql
""backup"" -- Dataframe hold data to be cleaned

backup = str(backup)
TrainingClaimNotes_KwdSyn = []
for index,row in KeywordSynonym.iterrows():
    word = KeywordSynonym.Synonym[index].lower()
    value = KeywordSynonym.Keyword[index].lower()
    my_regex = r"\b(?=\w)" + re.escape(word) + r"\b(?!\w)" 
    if re.search(my_regex,backup):
        backup = re.sub(my_regex, value, backup) 
    TrainingClaimNotes_KwdSyn.append(backup)

TrainingClaimNotes_KwdSyn_Cmp = backup.split('\'", "\'')

Find and Replace in DataFrame using Pandas in optimized way

Answers (1)

Related Questions