Zakariah Siyaji
Zakariah Siyaji

Reputation: 1119

How do I replace all string values with NaN (Dynamically)?

I want to find all the strings in my dataframe and I want to replace them with NaN values so that I can drop all associated NaN values with the function df.dropna(). For example, if I have the following data set:

x = np.array([1,2,np.NaN,4,5,6,7,8,9,10])
z = np.array([1,2,np.NaN,4,5,np.NaN,7,8,9,"My Name is Jeff"])
y = np.array(["Hello World",2,3,4,5,6,7,8,9,10])

I should first be able to dynamically replace all strings with np.nan so my output should be:

x = np.array([1,2,np.NaN,4,5,6,7,8,9,10])
z = np.array([1,2,np.NaN,4,5,np.NaN,7,8,9,np.NaN])
y = np.array([np.NaN,2,3,4,5,6,7,8,9,10])

and then running df.dropna() (Assume that x,y,z reside in a data frame and not just separate variables) should allow me to have:

x = np.array([2,4,5,7,8,9])
z = np.array([2,4,5,7,8,9])
y = np.array([2,4,5,7,8,9])

Upvotes: 0

Views: 1960

Answers (4)

Shiva
Shiva

Reputation: 33

Please find the following:

df = pd.DataFrame([x, y, z])

def Replace(i):
    try:
        float(i)
        return float(i)
    except:
           return np.nan

df = df.applymap(func=Replace)
df.dropna(axis=1)

Output

Upvotes: 1

Zakariah Siyaji
Zakariah Siyaji

Reputation: 1119

I think the following is the simplest rendition: The function called "cleanData" takes in a file as an argument and an array of columns that you may want to ignore. It will then replace all of the strings in the file with NaN values and then it will drop those NaN values.

def cleanData(file, ignore=[]):
    for column in file.columns:
        if len(ignore) is not 0:
            if column not in ignore:
                file[column] = file[column].apply(pd.to_numeric, errors='coerce')
        else:
            file[column] = file[column].apply(pd.to_numeric, errors='coerce')
    file = file.dropna()
    return file

Upvotes: 0

Parijat Bhatt
Parijat Bhatt

Reputation: 674

This works I think:

df = pd.DataFrame(data={'A':[1,2,'str'],'B':['name',2,2]})
for column in df.columns:
    df[column]=df[column].apply(lambda x:np.nan if type(x)==str else x)
print(df)

Upvotes: 0

BENY
BENY

Reputation: 323226

Since you tag pandas

pd.to_numeric(x,errors='coerce')

Upvotes: 3

Related Questions