SuperSaiyan
SuperSaiyan

Reputation: 25

Finding partial strings using str.find() then replace values from dictionary

I need to replace the values from a column. The values from the column need not to be exact match, so I use str.find(). Once it scanned the string, it should replace the values from the dictionary.

I achieved the desired result for one instance, but I need to do it multiple times.

I tried creating a function but it didn't work. It only worked for the last dictionary value.

dictionary  = {"AA" : "111", "BB" : "222", "CC": "333,444"}

#result = []
for k, v in dictionary.items():
    df["renamed"] = np.nan
    df.loc[(df["combined_topic"].str.find(k) != -1), "renamed"] = v
    #result.extend(df["renamed"].to_dict(orient="records"))

How should I fix my code? or can you suggest more efficient way to replace multiple values.

Expected output:

combined_topic          renamed
AA, harvard                 111
Diliman, Technology, BB     222
Cat, Dog, CC, Bull          333, 444
``


Upvotes: 1

Views: 215

Answers (1)

jezrael
jezrael

Reputation: 863301

Use Series.str.extract for get first matched value of dictionary and then Series.map by dict:

pat = '|'.join(dictionary)
df['renamed'] = df['combined_topic'].str.extract('('+ pat + ')', expand=False).map(dictionary)
print (df)
            combined_topic  renamed
0              AA, harvard      111
1  Diliman, Technology, BB      222
2       Cat, Dog, CC, Bull  333,444

Your solution houl be used with Series.str.contains, but mainly remove df["renamed"] = np.nan, because data are always overwritten in each loop:

for k, v in dictionary.items():
    df.loc[df["combined_topic"].str.contains(k), "renamed"] = v

Or:

for k, v in dictionary.items():
    df.loc[(df["combined_topic"].str.find(k) != -1), "renamed"] = v

Upvotes: 2

Related Questions