Cronos
Cronos

Reputation: 83

Converting negative number in string to float (Python)?

Ok. I give up.

I have a DataFrame with a column ("Amount") of large numbers:

Amount
-1 000 000,00
 4 848 903,00
-2 949 234,00
13 038 023,00
 7 985 232,00
 ....

I want to convert these to numbers that I can calculate with.

Let's investigate:

>type(b["Amount"][0])
str

Ok, it's a string.

>float("-1 000 000,00".replace(' ', '').replace(',','.'))
-1000000.00

Ok, works great!

To make a lambda thingy (to process all elements in column), I need it in a function:

def make_float(num):
    num = num.replace(' ','').replace(',','.')
    return float(num)


>make_float(b["Amount"][0])
ValueError: could not convert string to float: −1 000 000.00

What?!

>b["Amount"][0].replace(' ','').replace(',','.')
Out[258]:
'\xe2\x88\x921\xc2\xa0000\xc2\xa0000.00'

Oh no!! Unicode hell! I give up.

Does Python have an easy function/method that will convert my numbers (including negative) to something I can calculate with?

Upvotes: 4

Views: 11989

Answers (5)

Cronos
Cronos

Reputation: 83

Ok. This seemed to do the trick. It's a solution in 3 steps.

  1. I checked my dataset with chardetect data.csv It said 'utf8' with a confidence 0.99

  2. I made sure to pass that into my pandas.read_csv: pandas.read_csv(data....., encoding = 'utf8')

  3. I made a function,

def make_float(num):
    num = num.replace(u'\N{MINUS SIGN}', '-')  # encode the minus sign
    num = num.replace(',','.') #change the decimal separator from comma to dot
    num = num.replace(unichr(160), '') # encode the (non-breaking) space to ''
    num = float(num)
    return num

I then passed this function to the pandas.read_csv thing with `pandas.read_csv(data...., encoding='utf8', converters={'Amount':make_float}) `

Working good so far.

Upvotes: 0

Håken Lid
Håken Lid

Reputation: 23064

Your data contains unicode minus sign (one of several minus signs in unicode) and non breaking space (one of several space characters in unicode)

You can use str.translate() to convert characters to a format that can be correctly parsed by float().

def make_float(num):
     return float(num.translate({0x2c: '.', 0xa0: None, 0x2212: '-'}))

make_float('−1\xa0000\xa0000,00')

Upvotes: 1

Igor
Igor

Reputation: 2909

What if you try to encode it?

def make_float(num):
    num = num.encode('latin-1').replace(' ','').replace(',','.')
    return float(num)

Upvotes: 1

jpp
jpp

Reputation: 164623

This should solve your issue. The problem is to get the first value of the column as a value you should use pd.Series.values[0].

import pandas as pd

s = pd.Series(['-1 000 000,00'])

def make_float(num):
    num = num.replace(' ','').replace(',','.')
    return float(num)

s.map(make_float)

# 0   -1000000.0
# dtype: float64

make_float(s.values[0])
# -1000000.0

Upvotes: 1

Rakesh
Rakesh

Reputation: 82755

looks like you have a problem with the minus('-') symbol in the string.

Try:

def make_float(num):
    num = num.replace(' ','').replace(',','.').replace("−", "-")
    return float(num)

Upvotes: 2

Related Questions