How to efficiently replace values in a dataframe by iterating through a dictionary?

Question

I have a dataframe of salary ranges like so:

import pandas as pd
df = pd.DataFrame(columns=['Salary'])
df.Salary = ['30,000-39,999', '5,000-7,499', '250,000-299,999', '4,000-4,999', '60,000-69,999', '10,000-14,999', '80,000-89,999', '$0-999', '2,000-2,999', '70,000-79,999', '90,000-99,999', '125,000-149,999', '$0-999', '$0-999', '40,000-49,999', '20,000-24,999', '125,000-149,999', '$0-999', '10,000-14,999', '15,000-19,999', '20,000-24,999', '100,000-124,999', '$0-999']
df

I want to replace these string values of the salary ranges with numbers, where 1 would denote $0-999, 2 would denote 1000-1999, etc. So, below is my code to do this, where I make a dictionary mapping the strings to numbers, and use 2 for loops - one to iterate through each row in the dataframe and one to iterate through each element in the dictionary:

salary_dict = {'$0-999':1, '1,000-1,999':2, '2,000-2,999':3, '3,000-3,999':4, '4,000-4,999':5, 
           '5,000-7,499':6, '7,500-9,999':7, '10,000-14,999':8, '15,000-19,999':9, '20,000-24,999':10, 
           '25,000-29,999':11, '30,000-39,999':12, '40,000-49,999':13, '50,000-59,999':14, '60,000-69,999':15, 
           '70,000-79,999':16, '80,000-89,999':17, '90,000-99,999':18, '100,000-124,999':19, '125,000-149,999':20, 
           '150,000-199,999':21, '200,000-249,999':22, '250,000-299,999':23, '300,000-500,000':24, '> $500,000':25}

for i in range(len(df)):
    for key in salary_dict:
        if df.Salary[i]==key:
            df.Salary[i] = salary_dict[key]
            break

df

This is ok for small dataframes, but with bigger (longer) dataframes, the code takes a long time to finish running. How do I optimize it?

Poojan · Accepted Answer

The most efficient way is to use series apply function. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.apply.html
using apply function on series apply any function defined to each of elemet.
Here we are mapping each element of df['Salary'] to its equivalent value in the dictionary.
If you dont understand this part lambda x: salary_dict.get(x, x) Look into python lambdas.
Also get method on dictonary is used just to safeguard incase key is not in dictonary.

df['Salary'] = df['Salary'].apply(lambda x: salary_dict.get(x, x))
print(df)

output:

How to efficiently replace values in a dataframe by iterating through a dictionary?

Answers (1)

Related Questions