AmanArora
AmanArora

Reputation: 2499

check if either of two substrings exist in a string

I am using the following code to replace all - and remove all , from my dataframe columns

df[['sale_price','mrp', 'discount', 'ratings', 'stars']]=df[['sale_price','mrp', 'discount', 'ratings', 'stars']].applymap(lambda r: np.nan if '-' in str(r) else str(r).replace(',', ''))

There are some columns which are "nan" (not np.nan but just string nan). To remove those as well, I do

useless_strings=['-','nan']
df[['sale_price','mrp', 'discount', 'ratings', 'stars']]=df[['sale_price','mrp', 'discount', 'ratings', 'stars']].applymap(lambda r: np.nan if any(xx in str(r) for xx in useless_strings) else str(r).replace(',', ''))

This does not remove those "nan" strings. What's wrong?

Upvotes: 1

Views: 96

Answers (1)

jezrael
jezrael

Reputation: 862841

Use DataFrame.replace with regex=True by substrings defined in dictionary:

df = pd.DataFrame([['10,4','-','nan',5,'kkk-oo']],
                  columns=['sale_price','mrp', 'discount', 'ratings', 'stars'])
print (df)
  sale_price mrp discount  ratings   stars
0       10,4   -      nan        5  kkk-oo


useless_strings=['-','nan']
d = dict.fromkeys(useless_strings, np.nan)
d[','] = ''
print (d)
{'-': nan, 'nan': nan, ',': ''}

cols = ['sale_price','mrp', 'discount', 'ratings', 'stars']
df[cols] = df[cols].replace(d, regex=True)
print (df)
  sale_price  mrp  discount  ratings  stars
0        104  NaN       NaN        5    NaN
    

Upvotes: 1

Related Questions