Reputation: 523
Problem: Polluted Dataframe.
Details: Frame consists of NaNs string values which i know the meaning of and numeric values.
Task: Replaceing the numeric values with NaNs
Example
import numpy as np
import pandas as pd
df = pd.DataFrame([['abc', 'cdf', 1], ['k', 'sum', 'some'], [1000, np.nan, 'nothing']])
out:
0 1 2
0 abc cdf 1
1 k sum some
2 1000 NaN nothing
Attempt 1 (Does not work, because regex only looks at string cells)
df.replace({'\d+': np.nan}, regex=True)
out:
0 1 2
0 abc cdf 1
1 k sum some
2 1000 NaN nothing
Preliminary Solution
val_set = set()
[val_set.update(i) for i in df.values]
def dis_nums(myset):
str_s = set()
num_replace_dict = {}
for i in range(len(myset)):
val = myset.pop()
if type(val) == str:
str_s.update([val])
else:
num_replace_dict.update({val:np.nan})
return str_s, num_replace_dict
strs, rpl_dict = dis_nums(val_set)
df.replace(rpl_dict, inplace=True)
out:
0 1 2
0 abc cdf NaN
1 k sum some
2 NaN NaN nothing
Question Is there any easier/ more pleasant solution?
Upvotes: 1
Views: 6872
Reputation: 11907
You can use a loop to go through each columns, and check each item. If it is an integer or float then replace it with np.nan. It can be done easily with map function applied on the column.
you can change the condition of the if
to incorporate any data type u want.
for x in df.columns:
df[x] = df[x].map(lambda item : np.nan if type(item) == int or type(item) == float else item)
This is a naive approach and there have to be better solutions than this.!!
Upvotes: 1
Reputation: 3103
You can do a round-conversion to str
to replace the values and back.
df.astype('str').replace({'\d+': np.nan, 'nan': np.nan}, regex=True).astype('object')
#This makes sure already existing np.nan are not lost
Output
0 1 2
0 abc cdf NaN
1 k sum some
2 NaN NaN nothing
Upvotes: 2