Shinchan
Shinchan

Reputation: 612

Python Invalid literal for float()

I am using HIGGS dataset for my Data Mining project. While parsing the data in python I received the following error,

ValueError: invalid literal for float(): -8.854051232337951660e-

I am getting this error for many values of same kind. I am using Apache Spark for distributed environment.

This is my row in dataset.

1.000000000000000000e+00,8.004817962646484375e-01,-3.643184900283813477e-01,-4.785313606262207031e-01,2.399173498153686523e+00,**-8.854051232337951660e-01**,1.204909682273864746e+00,-8.518521487712860107e-02,1.364478588104248047e+00,0.000000000000000000e+00,4.605550169944763184e-01,1.564514338970184326e-01,1.068501710891723633e+00,0.000000000000000000e+00,1.793796300888061523e+00,1.236290574073791504e+00,5.773849487304687500e-01,2.548224449157714844e+00,1.083405137062072754e+00,1.178002059459686279e-01,-1.116195082664489746e+00,0.000000000000000000e+00,8.484367132186889648e-01,1.113812208175659180e+00,9.878969192504882812e-01,5.820630192756652832e-01,4.325648546218872070e-01,1.004681587219238281e+00,8.518054485321044922e-01

I have checked and there are no discrepancies in data.

Can someone help me with this error message?

Upvotes: 3

Views: 2009

Answers (2)

glglgl
glglgl

Reputation: 91149

According to

ValueError: invalid literal for float(): -8.854051232337951660e-

the parser splits up that value too early.

Thus, you should have a look how the items look like when split up.

So try

for x in line.split(','):
    print repr(x),
    print repr(float(x))

and you'll see what happens for each item.

Personally, I have no idea why this might happen except for a corrupted data file which has a line breadk or comma where it shouldn't have.

Upvotes: 0

en_Knight
en_Knight

Reputation: 5381

As the exception suggests,

-8.854051232337951660e- is not a valid float in python

In particular, scientific notation is fine but it needs to have something after that e - your data is malformed. The following would be acceptable;

  • -8.854051232337951660e-1
  • -8.854051232337951660
  • -8.854051232337951660e1

Or from the docs if you prefer

Some examples of floating point literals:

3.14 10. .001 1e100 3.14e-10 0e0

The data without a trailing digit does not mean anything. Without the e, python can assume the literal terminated; with an additional digit(s), python can expand the scientific notation

If the data looks fine to you but python can't seem to figure out what's (supposed to be) going on, check for subtle mis-formatting like blank space in between the e and the next digit

In response to edit

That last point is key. The data looks good to you but python complains; that's because how you're "parsing" in python doesn't align with how you're parsing with your eyes and brain. What are you using to parse the data? Do you split by comma? Do you split when digits start (that would cause problems). The exception is raised as described above; for you, the problem is tracking down why you are cropping out the last digit in your parse . (By the way, That sounds like a new question to me, not a continuation of this question).

For example, in your newly posted code, there looks like there is a newline starting after the "e-" and before the "01". If that's my browser, then... oh well. If not, then that is your problem

To skip the erroneous entries, you can do something like this (tl;dr try/except them, because it's better to ask forgiveness than permission)

Upvotes: 2

Related Questions