Don Quixote
Don Quixote

Reputation: 145

What is the best way to swap values in Python pandas DataFrame to clean up the data

I have a DataFrame where the column 'Name' has some errors in it. I have created a dictionary with the incorrect spellings at the key and the values as the correct spelling. What is the best way to replace the the incorrect spellings with the correct spellings? This is what I did.

for incorrect, correct in incorrect_to_correct.items():
    mask = s_df['Name'] == incorrect
    s_df.loc[mask, 'Name'] = correct

Is there a better way of doing this? I was told that generally if you are using a for loop with pandas that you should rethink what you are doing? Is there a better way to clean up the data? Is this dictionary method "wrong"? I am new to pandas and any help would be appreciated. Thanks!

Upvotes: 3

Views: 1876

Answers (1)

jezrael
jezrael

Reputation: 863146

I think you can use replace by dict:

df.Name = df.Name.replace(incorrect_to_correct)

Sample:

df = pd.DataFrame({'Name' : ["john","mary","jon", "mar"]})
print (df)
   Name
0  john
1  mary
2   jon
3   mar

incorrect_to_correct = {'jon':'john', 'mar':'mary'}

df.Name = df.Name.replace(incorrect_to_correct)
print (df)
   Name
0  john
1  mary
2  john
3  mary

Upvotes: 5

Related Questions