user6453877
user6453877

Reputation: 334

Invalid literal for float() in Pandas

I am working on a dataset with more than 60M rows in Pandas. In one of my numeric columns, I suspect that there is a non-numeric char which gives me the error message "invalid literal for float(): 4010146209+".

I am able to load the column as obj but not as float or int.

I have tried replacing r"\d" and "+" with "".

I need to either remove rows with non-numeric char in defined column or remove all char keeping the column from being loaded as float or int.

The column contains NaN, but these are dropped before I try to cast as float.

Upvotes: 3

Views: 2893

Answers (2)

Stefan
Stefan

Reputation: 42885

You could use .replace() with a regular expression to keep the numeric values rather than converting to np.nan using pd.to_numeric:

df['col_name'].replace(to_replace='[^0-9]+', value='',inplace=True,regex=True)

Upvotes: 1

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210872

i would use to_numeric() function for that

demo

In [583]: a
Out[583]:
0                         50.5
1                         50.7
2                         50.9
3                       52.70+
4                         52.9
5                       520.31
6    really bad number: 520.92
Name: Price, dtype: object

In [584]: a = pd.to_numeric(a, errors='coerce')

In [585]: a
Out[585]:
0     50.50
1     50.70
2     50.90
3       NaN
4     52.90
5    520.31
6       NaN
Name: Price, dtype: float64

Upvotes: 2

Related Questions