PatrikH
PatrikH

Reputation: 23

numpy genfromtext or numpy loadtxt " ValueError: could not convert string to float" or "too many values to unpack", tried almost everything

I have a very frustrating problem trying to solve for many hours now. I really maxed out all topics here on google with relevant questions and answers.

What I would like to do: I have big datasets (50-70k columns in CSV) of photometry data that I would like to load in, then work with them as floats and plot them eventually with some fittings and calculations.

an example of the data:

Time(s),AnalogIn-1,AnalogIn-2
0.00E+00,3.96E-02,3.33E-02
0.00E+00,4.10E-02,3.33E-02

So each column has many numbers with scientific notation.

In my code I first used the following to load the text:

time, dat1, dat2= np.loadtxt(path, skiprows=1, unpack=True, delimiter=",")

And it's been dropping "ValueError: could not convert string to float:"

It works fine if I go to for e.g Excel, convert the whole CSV sheet from 'General' to 'Numbers.

I tried literally everything discussed here, first starting with skipping headers and first rows, both with np.loadtxt, np.genfromtxt or with pandas loader. Also tried to change datatypes, fixing converters, re-mapping whatever what was loaded to floats. This helped, but only for certain rows and error popped in soon at random rows or popped back 'Too many values to unpack'. - I tried skip blank, nan also. I suspect the problem still somewhere in the conversion, that the scientific notation is indeed a string and it has "E" "+" and "-" chars in "random" order. I still believe I'm missing something very easy solution to this, as my CSV is really standard data.

Upvotes: 1

Views: 8870

Answers (2)

hpaulj
hpaulj

Reputation: 231665

With your sample loadtxt works fine:

In [142]: np.loadtxt(txt.splitlines(), delimiter=',',skiprows=1)
Out[142]: 
array([[ 0.    ,  0.0396,  0.0333],
       [ 0.    ,  0.041 ,  0.0333]])
In [143]: time,dat1,dat2=np.loadtxt(txt.splitlines(), delimiter=',',skiprows=1,
     ...: unpack=True)
In [144]: time,dat1,dat2
Out[144]: (array([ 0.,  0.]), array([ 0.0396,  0.041 ]), array([ 0.0333,  0.0333]))

Now if I change one of the txt lines to:

0.00E+00,3.96E-02,3.33E+-02

I get an error like yours:

In [146]: np.loadtxt(txt.splitlines(), delimiter=',',skiprows=1)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-146-ff3d27e104fc> in <module>()
----> 1 np.loadtxt(txt.splitlines(), delimiter=',',skiprows=1)

/usr/local/lib/python3.5/dist-packages/numpy/lib/npyio.py in loadtxt(fname, dtype, comments, delimiter, converters, skiprows, usecols, unpack, ndmin)
   1022 
   1023             # Convert each value according to its column and store
-> 1024             items = [conv(val) for (conv, val) in zip(converters, vals)]
   1025             # Then pack it according to the dtype's nesting
   1026             items = pack_items(items, packing)

/usr/local/lib/python3.5/dist-packages/numpy/lib/npyio.py in <listcomp>(.0)
   1022 
   1023             # Convert each value according to its column and store
-> 1024             items = [conv(val) for (conv, val) in zip(converters, vals)]
   1025             # Then pack it according to the dtype's nesting
   1026             items = pack_items(items, packing)

/usr/local/lib/python3.5/dist-packages/numpy/lib/npyio.py in floatconv(x)
    723         if b'0x' in x:
    724             return float.fromhex(asstr(x))
--> 725         return float(x)
    726 
    727     typ = dtype.type

ValueError: could not convert string to float: b'3.33E+-02

Notice that my error shows the problem string. Does your's do that as well? If so, why didn't you include that information? Also you don't include any of the traceback. We don't need to see it in all its glory, but some helps when setting context.

I tried the +- because I vaguely recall some SO questions along that line, either a Python formatter producing that kind of exponential, or have problems reading that. We could search for details if needed.

If the load works for some lines, but fails on others, you need to isolate the problem lines, and test them.


Downloading your link, I have no problem loading the file:

In [147]: np.loadtxt('/home/paul/Downloads/data1.csv', delimiter=',',skiprows=1)
Out[147]: 
array([[  0.00000000e+00,   3.96000000e-02,   3.33000000e-02],
       [  0.00000000e+00,   4.10000000e-02,   3.33000000e-02],
       [  6.94000000e-04,   4.10000000e-02,   3.40000000e-02],
       ..., 
       [  8.02000000e+00,   3.96000000e-02,   3.19000000e-02],
       [  8.02000000e+00,   3.82000000e-02,   3.33000000e-02],
       [  8.02000000e+00,   3.75000000e-02,   3.33000000e-02]])
In [148]: data = _
In [149]: data.shape
Out[149]: (71600, 3)

'Too many values to unpack' - I don't like to use unpack unless I know for sure the number of columns in the file (not probably not even then).

In [169]: a1,a2 = np.loadtxt(txt.splitlines(), delimiter=',',skiprows=1,unpack=
     ...: True)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-169-4dea7c2876c1> in <module>()
----> 1 a1,a2 = np.loadtxt(txt.splitlines(), delimiter=',',skiprows=1,unpack=True)

ValueError: too many values to unpack (expected 2)

Again the full error message - you left off the expected. The file sample produces 3 columns, so I get this error if I provide the wrong number of variables.

With unpack it may be wise to specify the number of columns, eg

In [170]: a1,a2 = np.loadtxt(txt.splitlines(), delimiter=',',skiprows=1,unpack=
     ...: True, usecols=[1,2])

Upvotes: 1

Warren Weckesser
Warren Weckesser

Reputation: 114946

This is really just a long comment, but if it identifies the problem, it might be an answer.

With the CSV file that you linked to in a comment, I ran

time, dat1, dat2 = np.loadtxt("data1.csv", skiprows=1, unpack=True, delimiter=",")

and it worked with no errors.

When I inspected the file, I noticed that the line endings were a single carriage return character (often abbreviated CR, hex code 0d). You mentioned using Excel, so I assume you are using Windows. The usual line ending in Windows is CR+LF (two characters: carriage return followed by linefeed; hex 0d0a).

That might be the problem (but I expected Python file I/O to take care of this). I don't have a Windows system to test this, so at the moment all I can say is "try this":

with open('data1.csv', 'r', newline='\r') as f:
    time, dat1, dat2 = np.loadtxt(f, skiprows=1, unpack=True, delimiter=",")

Upvotes: 2

Related Questions