Reputation: 99
I'm trying to convert a column using pd.to_numeric, but for some reason it turns all values (except one) into NaN:
In[]: pd.to_numeric(portfolio["Principal Remaining"],errors="coerce")
Out[]:
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 NaN
10 NaN
11 NaN
12 NaN
13 NaN
14 NaN
15 NaN
16 NaN
17 NaN
18 836.61
19 NaN
20 NaN
...
Name: Principal Remaining, Length: 32314, dtype: float64
Thoughts on why this is happening? The original data looks like this:
1 18,052.02
2 27,759.85
3 54,061.75
4 89,363.61
5 46,954.46
6 64,295.64
7 100,000.00
8 27,905.98
9 13,821.48
10 16,937.89
...
Name: Principal Remaining, Length: 32314, dtype: object
Upvotes: 2
Views: 3266
Reputation: 402323
read_csv
with thousands=','
df = pd.read_csv('file.csv', thousands=',')
This fixes the problem while reading your data.
replace
and to_numeric
df['Principal Remaining'] = pd.to_numeric(
df['Principal Remaining'].str.replace(',', ''), errors='coerce')
If the first option isn't a choice, you'll need to get rid of the commas first using str.replace
, then call pd.to_numeric
as shown here.
Upvotes: 11