Replace values from pandas dataset with dictionary

Question

I am extracting a column from excel document with pandas. After that, I want to replace for each row of the selected column, all keys contained in multiple dictionaries grouped in a list.

import pandas as pd
file_loc = "excelFile.xlsx"
df = pd.read_excel(file_loc, usecols = "C")

In this case, my dataframe is called by df['Q10'], this data frame has more than 10k rows.

Traditionally, if I want to replace a value in df I use;

df['Q10'].str.replace('val1', 'val1')

Now, I have a dictionary of words like:

mydic = [
   {
      'key': 'wasn't',
      'value': 'was not' 
   }
   {
      'key': 'I'm',
      'value': 'I am' 
   }
   ... + tons of line of key value pairs
]

Currently, I have created a function that iterates over "mydic" and replacer one by one all occurrences.

def replaceContractions(df, mydic):
    for cont in contractions:
        df.str.replace(cont['key'], cont['value'])

Next I call this function passing mydic and my dataframe:

replaceContractions(df['Q10'], contractions)

First problem: this is very expensive because mydic has a lot of item and data set is iterate for each item on it.

Second: It seems that doesn't works :(

Any Ideas?

cs95 · Accepted Answer

Convert your "dictionary" to a more friendly format:

m = {d['key'] : d['value'] for d in mydic}

m   
{"I'm": 'I am', "wasn't": 'was not'}

Next, call replace with the regex switch and pass m to it.

df['Q10'] = df['Q10'].replace(m, regex=True)

replace accepts a dictionary of key-replacement pairs, and it should be much faster than iterating over each key-replacement at a time.

Replace values from pandas dataset with dictionary

Answers (1)

Related Questions