Reputation: 5031
I have data in many columns like the following in pandas dataframe:
col1| col2| ...| col99 |col100
MBs| Gigabytes|...| MBs| |MBs
Megabytes| GBs|...|Megabytes |Gigabytes
GB | Megabytes| ...|Gigabytes|Gigabytes
GBs | GB |... |MBs |Gigabytes
Gigabytes|Megabytes|...|Gigabytes |Megabytes
I have also a dictionary which maps similar values. For example,
mapping = {'Megabytes':'MB', 'Gigabytes':'GB', 'MBs':'MB','GBs':'GB', 'GB':'GB',}
I want to replace each value in the column with mapped values in the dict. Currently I am trying to do something like this but getting an error. Expected output should be
col1|col2|...|col99|col100
MB| GB|...| MB| |MB
MB|GB|...|MB|GB
GB |MB|...|GB|GB
GB|GB|...|MB|GB
GB|MB|...|GB|MB
# My current implementation
df = df.apply(lambda x: x.astype(str).replace('GBs', 'GB').replace('MBs', 'MB').replace('Megabytes', 'MB').replace('Gigabytes', 'GB'))
Can someone guide me a correct and faster way of doing this ?
Upvotes: 1
Views: 1102
Reputation: 294586
pd.DataFrame.replace
can take a dictionary of dictionaries where the first level of keys specify the column to apply the value when replacing.
We can use a dictionary comprehension to filter only those columns that are of dtype == object
df.replace({c: mapping for c in df if df[c].dtype == object})
col1 col2 col99 col100
0 MB GB MB MB
1 MB GB MB GB
2 GB MB GB GB
3 GB GB MB GB
4 GB MB GB MB
Upvotes: 3
Reputation: 210982
Try this:
df.loc[:, df.dtypes=='object'] = df.select_dtypes(['object']).replace(mapping, regex=True)
This will apply mapping
only to string
columns
If all your columns are of string (object
) dtype:
df = df.replace(mapping, regex=True)
or as @JohnGalt has proposed in the comment:
df = df.applymap(lambda x: mapping[x])
Upvotes: 2