user14613954
user14613954

Reputation: 31

How to use Nested dict to replace string in other column

I have a nested dictionary like below and I would like to replace string using inner key value pair if that key is at the end of string, replace it with value only when the country Code equals to dict key (not inner key)

'IND': {' PVT. LTD.': ' Pvt. Ltd.',
 ' pvt. Ltd': ' Pvt. Ltd.',
 ' PVT LTD': ' Pvt. Ltd.',
 ' L.L.P.': ' LLP',
 ' LTD.': ' Ltd.',
 ' LLP.': ' LLP',
 ' ltd': ' Ltd.',
 ' llp': ' LLP'},
 'GBR': {' P.L.C.': ' PLC',
 ' C.I.C.': ' CIC',
 ' p.l.c': ' PLC',
 ' c.i.c': ' CIC',
 ' s.e.': ' SE',
 ' PLC.': ' PLC'},
 'USA': {' LTD. CO.': ' Ltd. Co.',
 ' L.L.L.P.': ' LLLP',
 ' ltd. Co': ' Ltd. Co.',
 ' l.l.l.p': ' LLLP',
 ' L.L.P.': ' LLP',
 ' L.L.C.': ' LLC',
 ' l.l.p': ' LLP',
 ' l.l.c': ' LLC'}

My dataframe has two cols. Legal name and Reg Country Code -

Name Reg Country Code
NexPoint LTD. CO. USA
Silverplay P.L.C. GBR
ALLOYS PVT. LTD. IND
GALLIUM ltd. IND
ELLIOTT s.e. GBR

I used below code - it is replacing the string as and when the legal name finds the inner key but not checking the country condition with outer key. Can someone pl suggest me. (I have a big list)

for i in range(len(df)):
    for k1 in country_dict.items():
        if df.loc[i, 'Reg Country Code'] == k1:
            for k2, v2 in country_dict[k1].items():
                df.loc[df['Reg Country Code'] == k1, 'Name'] = [re.sub(k2, v, x) if x.endswith(k2) else x for x in df.loc[df['Reg Country Code'] == k1, 'Name']]

My Output should be:

Name Reg Country Code
NexPoint Ltd. Co. USA
Silverplay PLC GBR
ALLOYS Pvt. Ltd. IND
GALLIUM Ltd. IND
ELLIOTT SE GBR

Upvotes: 2

Views: 215

Answers (1)

Vaishali
Vaishali

Reputation: 38415

You can group the df by country code and replace

df['NAME'] = df.groupby('REG COUNTRY CODE')['NAME'].apply(lambda x: x.replace(d[x.name], regex = True))

    NAME                REG COUNTRY CODE
0   NexPoint Ltd. Co.   USA
1   Silverplay PLC      GBR
2   ALLOYS Pvt. Ltd.    IND
3   GALLIUM Ltd..       IND
4   ELLIOTT SE          GBR

Explanation:

  • df.groupby('REG COUNTRY CODE').name returns name of the group (country code in this case).

  • By using d[x.name], we are able to access the value dictionary corresponding to dictionary keys (country codes)

  • Setting regex to True helps us replace the string partially

Upvotes: 3

Related Questions