Reputation: 145
I have a DataFrame where the column 'Name' has some errors in it. I have created a dictionary with the incorrect spellings at the key and the values as the correct spelling. What is the best way to replace the the incorrect spellings with the correct spellings? This is what I did.
for incorrect, correct in incorrect_to_correct.items():
mask = s_df['Name'] == incorrect
s_df.loc[mask, 'Name'] = correct
Is there a better way of doing this? I was told that generally if you are using a for loop with pandas that you should rethink what you are doing? Is there a better way to clean up the data? Is this dictionary method "wrong"? I am new to pandas and any help would be appreciated. Thanks!
Upvotes: 3
Views: 1876
Reputation: 863146
I think you can use replace
by dict
:
df.Name = df.Name.replace(incorrect_to_correct)
Sample:
df = pd.DataFrame({'Name' : ["john","mary","jon", "mar"]})
print (df)
Name
0 john
1 mary
2 jon
3 mar
incorrect_to_correct = {'jon':'john', 'mar':'mary'}
df.Name = df.Name.replace(incorrect_to_correct)
print (df)
Name
0 john
1 mary
2 john
3 mary
Upvotes: 5