How to turn unwanted string values into NaNs in pandas

Question

I struggle with one task. I have imported an unclean dataframe and some columns that are supposed to have only float values also have strings which is corrupting my data and not allowing me to perform a regression.

If I have a dataframe X and "investment_rounds" column with mixed data types.

I want something like

np.where(X["investment_rounds"] == np.dtype.str, np.nan, X)

Any ideas?

Chris · Accepted Answer

They key here is the errors='coerce' parameter of to_numeric

Per the Documentation it will replace any value which cannot be converted with NaN

import pandas as pd
df = pd.DataFrame({'investment_rounds':['1.0','2.0','bad','data','3.0']})
df['investment_rounds'] = pd.to_numeric(df['investment_rounds'], errors='coerce')

Output

    investment_rounds
0   1.0
1   2.0
2   NaN
3   NaN
4   3.0

How to turn unwanted string values into NaNs in pandas

Answers (1)

Related Questions