user9762321
user9762321

Reputation: 93

Error trying to convert Object to Float in python

I have a file that lists deposit balances as strings. IN order to plot these numbers, I'm trying to convert the Objects to a float. So I wrote code to remove the $ and to take out spaces before and after the values.

member_clean.TotalDepositBalances = member_clean.TotalDepositBalances.str.replace('$', '')

member_clean['TotalDepositBalances'] = member_clean['TotalDepositBalances'].str.strip()

member_clean['TotalDepositBalances'] = member_clean['TotalDepositBalances'].astype(float)

When I run the code, I get an error message that says

ValueError: could not convert string to float:

That's it. Before I added the str.strip, the error message showed me that some values had spaces before and after, so I knew to remove those. But I'm a little confused what else is causing it,

I looked at the values of the column after I removed the spaces and $, and everything looks normal. Here's a sample.

  1. 309.00
  2. 38.00
  3. 12,486.00
  4. 6,108.00
  5. 2,537.00

Any ideas of what I could check for in the columns that may be causing this error

Upvotes: 2

Views: 1175

Answers (2)

Massifox
Massifox

Reputation: 4487

You have to delete the commas, they are not a numeric format recognized by Python. So considering the list you gave as possible input:

str_num = ['309.00 ', ' 38.00 ', ' 12,486.00 ', '6,108.00', ' 2,537.00']

you have to do this:

list(map(lambda s: float (s.replace (',', '')), str_num))

and gives your list of float:

[309.0, 38.0, 12486.0, 6108.0, 2537.0]

Note: You don't need to do str.strip() because the spaces are automatically deleted from the float casting operation.

Following your pipeline, before converting to float, you need to do:

member_clean['TotalDepositBalances'] = member_clean['TotalDepositBalances'].str.replace(',', '')

Or you can run your entire pipeline on one line of code as follows:

member_clean['TotalDepositBalances'] = member_clean['TotalDepositBalances'].replace('$', '').replace(',', '').astype(float)

Extra: Performance

Here you will find tests that present a comparison of various methods for performing multiple substitutions inserted in a string. Surprisingly use replace in cascade (as in your pipeline), it turns out to be more efficient than a regex for this type of operation. Give it a reading.

Upvotes: 3

Yaakov Bressler
Yaakov Bressler

Reputation: 12078

A useful method for working with large datasets or series is to create a lookup dictionary of corrected values so that duplicate values aren't re-calculated:

import pandas as pd
import re

def fast_num_conversion(s):
    """
    This is an extremely fast approach to parsing messy numbers to floats.
    For large data, the same values are often repeated. Rather than
    re-parse these, we store all unique dates, parse them, and
    use a lookup to convert all figures. 
    (Should be 10X faster than without lookup dict)

       Note, input must be a pandas series.
    """
    f_convert = lambda x: re.sub('[$\-,\| ]', '', x)
    f_float = lambda x: float(x) if x!='' else np.NaN
    vals = {curr:f_float(f_convert(curr)) for curr in s.unique()}
    return s.map(vals)

str_num = ['309.00', '38 .00 ', '12, 486.00', '6,108.00', '2,537.00']

print(pd.Series(fast_num_conversion))
0      309.0
1       38.0
2    12486.0
3     6108.0
4     2537.0

Upvotes: 0

Related Questions