Reputation: 23
I have a very frustrating problem trying to solve for many hours now. I really maxed out all topics here on google with relevant questions and answers.
What I would like to do: I have big datasets (50-70k columns in CSV) of photometry data that I would like to load in, then work with them as floats and plot them eventually with some fittings and calculations.
an example of the data:
Time(s),AnalogIn-1,AnalogIn-2
0.00E+00,3.96E-02,3.33E-02
0.00E+00,4.10E-02,3.33E-02
So each column has many numbers with scientific notation.
In my code I first used the following to load the text:
time, dat1, dat2= np.loadtxt(path, skiprows=1, unpack=True, delimiter=",")
And it's been dropping "ValueError: could not convert string to float:"
It works fine if I go to for e.g Excel, convert the whole CSV sheet from 'General' to 'Numbers.
I tried literally everything discussed here, first starting with skipping headers and first rows, both with np.loadtxt
, np.genfromtxt
or with pandas loader. Also tried to change datatypes, fixing converters, re-mapping whatever what was loaded to floats. This helped, but only for certain rows and error popped in soon at random rows or popped back 'Too many values to unpack'. - I tried skip blank, nan also. I suspect the problem still somewhere in the conversion, that the scientific notation is indeed a string and it has "E" "+" and "-" chars in "random" order. I still believe I'm missing something very easy solution to this, as my CSV is really standard data.
Upvotes: 1
Views: 8870
Reputation: 231665
With your sample loadtxt
works fine:
In [142]: np.loadtxt(txt.splitlines(), delimiter=',',skiprows=1)
Out[142]:
array([[ 0. , 0.0396, 0.0333],
[ 0. , 0.041 , 0.0333]])
In [143]: time,dat1,dat2=np.loadtxt(txt.splitlines(), delimiter=',',skiprows=1,
...: unpack=True)
In [144]: time,dat1,dat2
Out[144]: (array([ 0., 0.]), array([ 0.0396, 0.041 ]), array([ 0.0333, 0.0333]))
Now if I change one of the txt lines to:
0.00E+00,3.96E-02,3.33E+-02
I get an error like yours:
In [146]: np.loadtxt(txt.splitlines(), delimiter=',',skiprows=1)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-146-ff3d27e104fc> in <module>()
----> 1 np.loadtxt(txt.splitlines(), delimiter=',',skiprows=1)
/usr/local/lib/python3.5/dist-packages/numpy/lib/npyio.py in loadtxt(fname, dtype, comments, delimiter, converters, skiprows, usecols, unpack, ndmin)
1022
1023 # Convert each value according to its column and store
-> 1024 items = [conv(val) for (conv, val) in zip(converters, vals)]
1025 # Then pack it according to the dtype's nesting
1026 items = pack_items(items, packing)
/usr/local/lib/python3.5/dist-packages/numpy/lib/npyio.py in <listcomp>(.0)
1022
1023 # Convert each value according to its column and store
-> 1024 items = [conv(val) for (conv, val) in zip(converters, vals)]
1025 # Then pack it according to the dtype's nesting
1026 items = pack_items(items, packing)
/usr/local/lib/python3.5/dist-packages/numpy/lib/npyio.py in floatconv(x)
723 if b'0x' in x:
724 return float.fromhex(asstr(x))
--> 725 return float(x)
726
727 typ = dtype.type
ValueError: could not convert string to float: b'3.33E+-02
Notice that my error shows the problem string. Does your's do that as well? If so, why didn't you include that information? Also you don't include any of the traceback. We don't need to see it in all its glory, but some helps when setting context.
I tried the +-
because I vaguely recall some SO questions along that line, either a Python formatter producing that kind of exponential, or have problems reading that. We could search for details if needed.
If the load works for some lines, but fails on others, you need to isolate the problem lines, and test them.
Downloading your link, I have no problem loading the file:
In [147]: np.loadtxt('/home/paul/Downloads/data1.csv', delimiter=',',skiprows=1)
Out[147]:
array([[ 0.00000000e+00, 3.96000000e-02, 3.33000000e-02],
[ 0.00000000e+00, 4.10000000e-02, 3.33000000e-02],
[ 6.94000000e-04, 4.10000000e-02, 3.40000000e-02],
...,
[ 8.02000000e+00, 3.96000000e-02, 3.19000000e-02],
[ 8.02000000e+00, 3.82000000e-02, 3.33000000e-02],
[ 8.02000000e+00, 3.75000000e-02, 3.33000000e-02]])
In [148]: data = _
In [149]: data.shape
Out[149]: (71600, 3)
'Too many values to unpack' - I don't like to use unpack
unless I know for sure the number of columns in the file (not probably not even then).
In [169]: a1,a2 = np.loadtxt(txt.splitlines(), delimiter=',',skiprows=1,unpack=
...: True)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-169-4dea7c2876c1> in <module>()
----> 1 a1,a2 = np.loadtxt(txt.splitlines(), delimiter=',',skiprows=1,unpack=True)
ValueError: too many values to unpack (expected 2)
Again the full error message - you left off the expected
. The file sample produces 3 columns, so I get this error if I provide the wrong number of variables.
With unpack
it may be wise to specify the number of columns, eg
In [170]: a1,a2 = np.loadtxt(txt.splitlines(), delimiter=',',skiprows=1,unpack=
...: True, usecols=[1,2])
Upvotes: 1
Reputation: 114946
This is really just a long comment, but if it identifies the problem, it might be an answer.
With the CSV file that you linked to in a comment, I ran
time, dat1, dat2 = np.loadtxt("data1.csv", skiprows=1, unpack=True, delimiter=",")
and it worked with no errors.
When I inspected the file, I noticed that the line endings were a single carriage return character (often abbreviated CR, hex code 0d
). You mentioned using Excel, so I assume you are using Windows. The usual line ending in Windows is CR+LF (two characters: carriage return followed by linefeed; hex 0d0a
).
That might be the problem (but I expected Python file I/O to take care of this). I don't have a Windows system to test this, so at the moment all I can say is "try this":
with open('data1.csv', 'r', newline='\r') as f:
time, dat1, dat2 = np.loadtxt(f, skiprows=1, unpack=True, delimiter=",")
Upvotes: 2