Reputation: 334
I am working on a dataset with more than 60M rows in Pandas. In one of my numeric columns, I suspect that there is a non-numeric char which gives me the error message "invalid literal for float(): 4010146209+".
I am able to load the column as obj but not as float or int.
I have tried replacing r"\d" and "+" with "".
I need to either remove rows with non-numeric char in defined column or remove all char keeping the column from being loaded as float or int.
The column contains NaN, but these are dropped before I try to cast as float.
Upvotes: 3
Views: 2893
Reputation: 42885
You could use .replace()
with a regular expression to keep the numeric values rather than converting to np.nan
using pd.to_numeric
:
df['col_name'].replace(to_replace='[^0-9]+', value='',inplace=True,regex=True)
Upvotes: 1
Reputation: 210872
i would use to_numeric() function for that
demo
In [583]: a
Out[583]:
0 50.5
1 50.7
2 50.9
3 52.70+
4 52.9
5 520.31
6 really bad number: 520.92
Name: Price, dtype: object
In [584]: a = pd.to_numeric(a, errors='coerce')
In [585]: a
Out[585]:
0 50.50
1 50.70
2 50.90
3 NaN
4 52.90
5 520.31
6 NaN
Name: Price, dtype: float64
Upvotes: 2