Reputation: 2499
I am using the following code to replace all -
and remove all ,
from my dataframe columns
df[['sale_price','mrp', 'discount', 'ratings', 'stars']]=df[['sale_price','mrp', 'discount', 'ratings', 'stars']].applymap(lambda r: np.nan if '-' in str(r) else str(r).replace(',', ''))
There are some columns which are "nan"
(not np.nan but just string nan). To remove those as well, I do
useless_strings=['-','nan']
df[['sale_price','mrp', 'discount', 'ratings', 'stars']]=df[['sale_price','mrp', 'discount', 'ratings', 'stars']].applymap(lambda r: np.nan if any(xx in str(r) for xx in useless_strings) else str(r).replace(',', ''))
This does not remove those "nan"
strings. What's wrong?
Upvotes: 1
Views: 96
Reputation: 862841
Use DataFrame.replace
with regex=True
by substrings defined in dictionary:
df = pd.DataFrame([['10,4','-','nan',5,'kkk-oo']],
columns=['sale_price','mrp', 'discount', 'ratings', 'stars'])
print (df)
sale_price mrp discount ratings stars
0 10,4 - nan 5 kkk-oo
useless_strings=['-','nan']
d = dict.fromkeys(useless_strings, np.nan)
d[','] = ''
print (d)
{'-': nan, 'nan': nan, ',': ''}
cols = ['sale_price','mrp', 'discount', 'ratings', 'stars']
df[cols] = df[cols].replace(d, regex=True)
print (df)
sale_price mrp discount ratings stars
0 104 NaN NaN 5 NaN
Upvotes: 1