Reputation: 1266
I have dictionary with Word and its closest related words.
I want to replace the related words in the string with original word. Currently I am able replace words in the string which has only value per key ,I am not able to replace strings for a Key has multiple values. How can this be done
Example Input
North Indian Restaurant
South India Hotel
Mexican Restrant
Italian Hotpot
Cafe Bar
Irish Pub
Maggiee Baar
Jacky Craft Beer
Bristo 1889
Bristo 188
Bristo 188.
How dictionary is made
y= list(word)
words = y
similar = [[item[0] for item in model.wv.most_similar(word) if item[1] > 0.7] for word in words]
similarity_matrix = pd.DataFrame({'Orginal_Word': words, 'Related_Words': similar})
similarity_matrix = similarity_matrix[['Orginal_Word', 'Related_Words']]
Its 2 columns inside a dataframe with lists
Orginal_Word Related_Words
[Indian] [India,Ind,ind.]
[Restaurant] [Hotel,Restrant,Hotpot]
[Pub] [Bar,Baar, Beer]
[1888] [188, 188., 18]
Dictionary
similarity_matrix.set_index('Orginal_Word')['Related_Words'].to_dict()
{'Indian ': 'India, Ind, ind.',
'Restaurant': 'Hotel, Restrant, Hotpot',
'Pub': 'Bar, Baar, Beer'
'1888': '188, 188., 18'}
Expected Output
North Indian Restaurant
South India Restaurant
Mexican Restaurant
Italian Restaurant
Cafe Pub
Irish Pub
Maggiee Pub
Jacky Craft Pub
Bristo 1888
Bristo 1888
Bristo 1888
Any help is appreciated
Upvotes: 0
Views: 1407
Reputation: 862431
I think you can replace
by new dict with regex
from this answer:
d = {'Indian': 'India, Ind, ind.',
'Restaurant': 'Hotel, Restrant, Hotpot',
'Pub': 'Bar, Baar, Beer',
'1888': '188, 188., 18'}
d1 = {r'(?<!\S)'+ k.strip() + r'(?!\S)':k1 for k1, v1 in d.items() for k in v1.split(',')}
df['col'] = df['col'].replace(d1, regex=True)
print (df)
col
0 North Indian Restaurant
1 South Indian Restaurant
2 Mexican Restaurant
3 Italian Restaurant
4 Cafe Pub
5 Irish Pub
6 Maggiee Pub
7 Jacky Craft Pub
8 Bristo 1888
9 Bristo 1888
10 Bristo 1888
EDIT (Function for the above code):
def replace_words(d, col):
d1={r'(?<!\S)'+ k.strip() + r'(?!\S)':k1 for k1, v1 in d.items() for k in v1.split(',')}
df[col] = df[col].replace(d1, regex=True)
return df[col]
df['col'] = replace_words(d, 'col')
EDIT1:
If get errors like:
regex error- missing ), unterminated subpattern at position 7
is necessary escape regex values in keys:
import re
def replace_words(d, col):
d1={r'(?<!\S)'+ re.escape(k.strip()) + r'(?!\S)':k1 for k1, v1 in d.items() for k in v1.split(',')}
df[col] = df[col].replace(d1, regex=True)
return df[col]
df['col'] = replace_words(d, 'col')
Upvotes: 2