newtoCS
newtoCS

Reputation: 113

Pandas: Search if substring contains key in dictionary, and return value

I have a dictionary (key, value) and a dataframe using pandas.

mydict = {'KULAR LUMPUR' : 'MY',
            'SINGAPORE' : 'SG',
            'HONG KONG' : 'HK',
            'VIETNAM': 'VN'}

and a dataframe with column ['Address']

                              Address
0  234 JALAN ST KULAR LUMPUR MALAYSIA
1       123 BUILDING STREET SINGAPORE
2          67 CANNING VALE, HONG KONG

How do I search through the dataframe to get the value from the dictionary if substring is found in the key of the dictionary.

e.g.

                              Address Code
0  234 JALAN ST KULAR LUMPUR MALAYSIA   MY
1       123 BUILDING STREET SINGAPORE   SG
2          67 CANNING VALE, HONG KONG   HK

Upvotes: 0

Views: 2703

Answers (1)

jezrael
jezrael

Reputation: 862511

Use str.extract by regex with keys of dictionary with map:

df = pd.DataFrame({'Address': ['234 JALAN ST KULAR LUMPUR MALAYSIA', 
                               '123 BUILDING STREET SINGAPORE', 
                               '67 CANNING VALE, HONG KONG']})

print (df)
                              Address
0  234 JALAN ST KULAR LUMPUR MALAYSIA
1       123 BUILDING STREET SINGAPORE
2          67 CANNING VALE, HONG KONG

mydict = {'KULAR LUMPUR' : 'MY',
            'SINGAPORE' : 'SG',
            'HONG KONG' : 'HK',
            'VIETNAM': 'VN'}

pat = '|'.join(r"\b{}\b".format(x) for x in mydict.keys())
df['Code'] = df['Address'].str.extract('('+ pat + ')', expand=False).map(mydict)

print (df)
                              Address Code
0  234 JALAN ST KULAR LUMPUR MALAYSIA   MY
1       123 BUILDING STREET SINGAPORE   SG
2          67 CANNING VALE, HONG KONG   HK

Explanation:

print (pat)
\bKULAR LUMPUR\b|\bSINGAPORE\b|\bHONG KONG\b|\bVIETNAM\b

\b are called word boundaries for match words between \b
| are for regex OR

Upvotes: 6

Related Questions