Reputation: 1155
I am trying to remove specific typos from a dictionary of dataframes, which looks like this:
import pandas as pd
data = {'dataframe_1':pd.DataFrame({'col1': ['John', 'Ashley'], 'col2': ['+10', '-1']}), 'dataframe_2':pd.DataFrame({'col3': ['Italy', 'Brazil', 'Japan'], 'col4': ['Milan', 'Rio do Jaineiro', 'Tokio'], 'percentage':['+95%', '≤0%', '80%+']})}
The function remove_typos()
is used to remove specific typos, however when applied it returns a corrupted dataframe.
def remove_typos(string):
# remove '+' and '≤'
string=string.replace('+', '')
string=string.replace('≤', '')
return string
# store remove_typos() output in a dictionary of dataframes
cleaned_df = pd.concat(data.values()).pipe(remove_typos)
Console Output:
# col1 col2 col3 col4 percentage
#0 John +10 NaN NaN NaN
#1 Ashley -1 NaN NaN NaN
#0 NaN NaN Italy Milan +95%
#1 NaN NaN Brazil Rio do Jaineiro ≤0%
#2 NaN NaN Japan Tokio 80%+
The idea is that the function returns a cleaned df where each dataframe is represented by a dictionary key:
data['dataframe_1']
# col1 col2
#0 John 10
#1 Ashley -1
Is there any other way to apply this function over a dict of df's?
Upvotes: 1
Views: 230
Reputation: 75080
There is no harm using a loop in a dictionary (not a dataframe)
data1 = {}
for k,v in data.items():
v1 = v.select_dtypes("O")
v = v.assign(**v1.applymap(remove_typos))
data1[k] = v
print(data1)
{'dataframe_1': col1 col2
0 John 10
1 Ashley -1, 'dataframe_2': col3 col4 percentage
0 Italy Milan 95%
1 Brazil Rio do Jaineiro 0%
2 Japan Tokio 80%}
Upvotes: 2
Reputation: 71689
We can replace
the values inside a dict
comprehension
data = {k: v.replace([r'\+', '≤'], '', regex=True) for k, v in data.items()}
>>> data['dataframe_1']
col1 col2
0 John 10
1 Ashley -1
>>> data['dataframe_2']
col3 col4 percentage
0 Italy Milan 95%
1 Brazil Rio do Jaineiro 0%
2 Japan Tokio 80%
Upvotes: 3