Reputation: 57
I struggle with one task. I have imported an unclean dataframe and some columns that are supposed to have only float values also have strings which is corrupting my data and not allowing me to perform a regression.
If I have a dataframe X
and "investment_rounds"
column with mixed data types.
I want something like
np.where(X["investment_rounds"] == np.dtype.str, np.nan, X)
Any ideas?
Upvotes: 0
Views: 514
Reputation: 16147
They key here is the errors='coerce'
parameter of to_numeric
Per the Documentation it will replace any value which cannot be converted with NaN
import pandas as pd
df = pd.DataFrame({'investment_rounds':['1.0','2.0','bad','data','3.0']})
df['investment_rounds'] = pd.to_numeric(df['investment_rounds'], errors='coerce')
Output
investment_rounds
0 1.0
1 2.0
2 NaN
3 NaN
4 3.0
Upvotes: 1