Rahul rajan
Rahul rajan

Reputation: 1266

How to replace a string using a dictionary containing multiple values for a key in python

I have dictionary with Word and its closest related words.

I want to replace the related words in the string with original word. Currently I am able replace words in the string which has only value per key ,I am not able to replace strings for a Key has multiple values. How can this be done

Example Input

North Indian Restaurant
South India  Hotel
Mexican Restrant
Italian  Hotpot
Cafe Bar
Irish Pub
Maggiee Baar
Jacky Craft Beer
Bristo 1889
Bristo 188
Bristo 188.

How dictionary is made

y= list(word)
words = y
similar = [[item[0] for item in model.wv.most_similar(word) if item[1] > 0.7] for word in words]
similarity_matrix = pd.DataFrame({'Orginal_Word': words, 'Related_Words': similar})
similarity_matrix = similarity_matrix[['Orginal_Word', 'Related_Words']] 

Its 2 columns inside a dataframe with lists

Orginal_Word    Related_Words
[Indian]        [India,Ind,ind.]    
[Restaurant]    [Hotel,Restrant,Hotpot]   
[Pub]           [Bar,Baar, Beer]     
[1888]          [188, 188., 18] 

Dictionary

similarity_matrix.set_index('Orginal_Word')['Related_Words'].to_dict()

{'Indian ': 'India, Ind, ind.',
 'Restaurant': 'Hotel, Restrant, Hotpot',
 'Pub': 'Bar, Baar, Beer'
 '1888': '188, 188., 18'}

Expected Output

North Indian Restaurant
South India  Restaurant
Mexican Restaurant
Italian  Restaurant
Cafe Pub
Irish Pub
Maggiee Pub
Jacky Craft Pub
Bristo 1888
Bristo 1888
Bristo 1888

Any help is appreciated

Upvotes: 0

Views: 1407

Answers (1)

jezrael
jezrael

Reputation: 862431

I think you can replace by new dict with regex from this answer:

d = {'Indian': 'India, Ind, ind.',
 'Restaurant': 'Hotel, Restrant, Hotpot',
 'Pub': 'Bar, Baar, Beer',
 '1888': '188, 188., 18'}

d1 = {r'(?<!\S)'+ k.strip() + r'(?!\S)':k1 for k1, v1 in d.items() for k in v1.split(',')}

df['col'] = df['col'].replace(d1, regex=True)
print (df)
                        col
0   North Indian Restaurant
1   South Indian Restaurant
2        Mexican Restaurant
3       Italian  Restaurant
4                  Cafe Pub
5                 Irish Pub
6               Maggiee Pub
7           Jacky Craft Pub
8               Bristo 1888
9               Bristo 1888
10              Bristo 1888

EDIT (Function for the above code):

def replace_words(d, col):
    d1={r'(?<!\S)'+ k.strip() + r'(?!\S)':k1 for k1, v1 in d.items() for k in v1.split(',')}
    df[col] = df[col].replace(d1, regex=True)
    return df[col]

df['col'] = replace_words(d, 'col')

EDIT1:

If get errors like:

regex error- missing ), unterminated subpattern at position 7

is necessary escape regex values in keys:

import re

def replace_words(d, col):
    d1={r'(?<!\S)'+ re.escape(k.strip()) + r'(?!\S)':k1 for k1, v1 in d.items() for k in v1.split(',')}
    df[col] = df[col].replace(d1, regex=True)
    return df[col]

df['col'] = replace_words(d, 'col')

Upvotes: 2

Related Questions