Reputation: 113
I have a dictionary (key, value) and a dataframe using pandas.
mydict = {'KULAR LUMPUR' : 'MY',
'SINGAPORE' : 'SG',
'HONG KONG' : 'HK',
'VIETNAM': 'VN'}
and a dataframe with column ['Address']
Address
0 234 JALAN ST KULAR LUMPUR MALAYSIA
1 123 BUILDING STREET SINGAPORE
2 67 CANNING VALE, HONG KONG
How do I search through the dataframe to get the value from the dictionary if substring is found in the key of the dictionary.
e.g.
Address Code
0 234 JALAN ST KULAR LUMPUR MALAYSIA MY
1 123 BUILDING STREET SINGAPORE SG
2 67 CANNING VALE, HONG KONG HK
Upvotes: 0
Views: 2703
Reputation: 862511
Use str.extract
by regex
with keys of dictionary with map
:
df = pd.DataFrame({'Address': ['234 JALAN ST KULAR LUMPUR MALAYSIA',
'123 BUILDING STREET SINGAPORE',
'67 CANNING VALE, HONG KONG']})
print (df)
Address
0 234 JALAN ST KULAR LUMPUR MALAYSIA
1 123 BUILDING STREET SINGAPORE
2 67 CANNING VALE, HONG KONG
mydict = {'KULAR LUMPUR' : 'MY',
'SINGAPORE' : 'SG',
'HONG KONG' : 'HK',
'VIETNAM': 'VN'}
pat = '|'.join(r"\b{}\b".format(x) for x in mydict.keys())
df['Code'] = df['Address'].str.extract('('+ pat + ')', expand=False).map(mydict)
print (df)
Address Code
0 234 JALAN ST KULAR LUMPUR MALAYSIA MY
1 123 BUILDING STREET SINGAPORE SG
2 67 CANNING VALE, HONG KONG HK
Explanation:
print (pat)
\bKULAR LUMPUR\b|\bSINGAPORE\b|\bHONG KONG\b|\bVIETNAM\b
\b
are called word boundaries for match words between \b
|
are for regex OR
Upvotes: 6