Reputation: 83
Ok. I give up.
I have a DataFrame with a column ("Amount") of large numbers:
Amount
-1 000 000,00
4 848 903,00
-2 949 234,00
13 038 023,00
7 985 232,00
....
I want to convert these to numbers that I can calculate with.
Let's investigate:
>type(b["Amount"][0])
str
Ok, it's a string.
>float("-1 000 000,00".replace(' ', '').replace(',','.'))
-1000000.00
Ok, works great!
To make a lambda thingy (to process all elements in column), I need it in a function:
def make_float(num):
num = num.replace(' ','').replace(',','.')
return float(num)
>make_float(b["Amount"][0])
ValueError: could not convert string to float: −1 000 000.00
What?!
>b["Amount"][0].replace(' ','').replace(',','.')
Out[258]:
'\xe2\x88\x921\xc2\xa0000\xc2\xa0000.00'
Oh no!! Unicode hell! I give up.
Does Python have an easy function/method that will convert my numbers (including negative) to something I can calculate with?
Upvotes: 4
Views: 11989
Reputation: 83
Ok. This seemed to do the trick. It's a solution in 3 steps.
I checked my dataset with chardetect data.csv
It said 'utf8' with a confidence 0.99
I made sure to pass that into my pandas.read_csv: pandas.read_csv(data....., encoding = 'utf8')
I made a function,
def make_float(num):
num = num.replace(u'\N{MINUS SIGN}', '-') # encode the minus sign
num = num.replace(',','.') #change the decimal separator from comma to dot
num = num.replace(unichr(160), '') # encode the (non-breaking) space to ''
num = float(num)
return num
I then passed this function to the pandas.read_csv thing with `pandas.read_csv(data...., encoding='utf8', converters={'Amount':make_float}) `
Working good so far.
Upvotes: 0
Reputation: 23064
Your data contains unicode minus sign (one of several minus signs in unicode) and non breaking space (one of several space characters in unicode)
You can use str.translate()
to convert characters to a format that can be correctly parsed by float()
.
def make_float(num):
return float(num.translate({0x2c: '.', 0xa0: None, 0x2212: '-'}))
make_float('−1\xa0000\xa0000,00')
Upvotes: 1
Reputation: 2909
What if you try to encode it?
def make_float(num):
num = num.encode('latin-1').replace(' ','').replace(',','.')
return float(num)
Upvotes: 1
Reputation: 164623
This should solve your issue. The problem is to get the first value of the column as a value you should use pd.Series.values[0]
.
import pandas as pd
s = pd.Series(['-1 000 000,00'])
def make_float(num):
num = num.replace(' ','').replace(',','.')
return float(num)
s.map(make_float)
# 0 -1000000.0
# dtype: float64
make_float(s.values[0])
# -1000000.0
Upvotes: 1
Reputation: 82755
looks like you have a problem with the minus('-') symbol in the string.
Try:
def make_float(num):
num = num.replace(' ','').replace(',','.').replace("−", "-")
return float(num)
Upvotes: 2