Reputation: 1119
I want to find all the strings in my dataframe and I want to replace them with NaN values so that I can drop all associated NaN values with the function df.dropna(). For example, if I have the following data set:
x = np.array([1,2,np.NaN,4,5,6,7,8,9,10])
z = np.array([1,2,np.NaN,4,5,np.NaN,7,8,9,"My Name is Jeff"])
y = np.array(["Hello World",2,3,4,5,6,7,8,9,10])
I should first be able to dynamically replace all strings with np.nan so my output should be:
x = np.array([1,2,np.NaN,4,5,6,7,8,9,10])
z = np.array([1,2,np.NaN,4,5,np.NaN,7,8,9,np.NaN])
y = np.array([np.NaN,2,3,4,5,6,7,8,9,10])
and then running df.dropna() (Assume that x,y,z reside in a data frame and not just separate variables) should allow me to have:
x = np.array([2,4,5,7,8,9])
z = np.array([2,4,5,7,8,9])
y = np.array([2,4,5,7,8,9])
Upvotes: 0
Views: 1960
Reputation: 33
Please find the following:
df = pd.DataFrame([x, y, z])
def Replace(i):
try:
float(i)
return float(i)
except:
return np.nan
df = df.applymap(func=Replace)
df.dropna(axis=1)
Upvotes: 1
Reputation: 1119
I think the following is the simplest rendition: The function called "cleanData" takes in a file as an argument and an array of columns that you may want to ignore. It will then replace all of the strings in the file with NaN values and then it will drop those NaN values.
def cleanData(file, ignore=[]):
for column in file.columns:
if len(ignore) is not 0:
if column not in ignore:
file[column] = file[column].apply(pd.to_numeric, errors='coerce')
else:
file[column] = file[column].apply(pd.to_numeric, errors='coerce')
file = file.dropna()
return file
Upvotes: 0
Reputation: 674
This works I think:
df = pd.DataFrame(data={'A':[1,2,'str'],'B':['name',2,2]})
for column in df.columns:
df[column]=df[column].apply(lambda x:np.nan if type(x)==str else x)
print(df)
Upvotes: 0