Reputation: 9119
I have such piece of code, where I try to load four columns from csv
file
import numpy as np
rtype = np.dtype([('1', np.float), ('2', np.float), ('3', np.float), ('tier', np.str, 32)])
x1, x2, x3, x4 = np.genfromtxt("../Data/out.txt", dtype=rtype, skip_header=1, delimiter=",", usecols=(3, 4, 5, 6), unpack=True)
But I have an error:
ValueError: too many values to unpack (expected 4)
This is a lit bit strange because I have four variables and load four columns.
How to load them correctly?
IMHO, problem is in np.dtype
because without it, all works fine (with other types though). I use python3.
Upvotes: 1
Views: 493
Reputation: 231385
Looks like you have a text like:
In [447]: txt=b"""1.2 3.3 2.0 str
...: 3.3 3.3 2.2 astring
...: """
My first choice is genfromtxt
with dtype=None
(automatic dtype determination):
In [448]: np.genfromtxt(txt.splitlines(),dtype=None)
Out[448]:
array([(1.2, 3.3, 2.0, b'str'), (3.3, 3.3, 2.2, b'astring')],
dtype=[('f0', '<f8'), ('f1', '<f8'), ('f2', '<f8'), ('f3', 'S7')])
Without dtype
it tries to make everything float - including the string column:
In [449]: np.genfromtxt(txt.splitlines())
Out[449]:
array([[ 1.2, 3.3, 2. , nan],
[ 3.3, 3.3, 2.2, nan]])
I don't use unpack
much, preferring to get one 2d or structured array. But with unpack:
In [450]: x1,x2,x3,x4=np.genfromtxt(txt.splitlines(),unpack=True)
In [451]: x1
Out[451]: array([ 1.2, 3.3])
In [452]: x4
Out[452]: array([ nan, nan])
I still get the nan
for the string column.
Borrowing the dtype from the dtype=None
case:
In [456]: dt=np.dtype([('f0', '<f8'), ('f1', '<f8'), ('f2', '<f8'), ('f3', 'S7')])
In [457]: dt
Out[457]: dtype([('f0', '<f8'), ('f1', '<f8'), ('f2', '<f8'), ('f3', 'S7')])
In [458]: np.genfromtxt(txt.splitlines(),unpack=True,dtype=dt)
Out[458]:
array([(1.2, 3.3, 2.0, b'str'), (3.3, 3.3, 2.2, b'astring')],
dtype=[('f0', '<f8'), ('f1', '<f8'), ('f2', '<f8'), ('f3', 'S7')])
In [459]: _.shape
Out[459]: (2,)
With this compound dtype, unpack
gives me one item per row of the text, not one item per column. In other words, unpack
does not split up the structured fields.
One way to handle the string column and still use unpack is to read the text twice:
first load the float columns:
In [462]: x1,x2,x3=np.genfromtxt(txt.splitlines(),unpack=True,usecols=[0,1,2])
In [463]: x3
Out[463]: array([ 2. , 2.2])
then load the string column, with dtype=None
or S32
:
In [466]: x4=np.genfromtxt(txt.splitlines(),unpack=True,usecols=[3],dtype=None)
In [467]: x4
Out[467]:
array([b'str', b'astring'],
dtype='|S7')
Another option is to load the structured array, and unpack the fields individually
In [468]: data = np.genfromtxt(txt.splitlines(),dtype=None)
In [469]: data.dtype
Out[469]: dtype([('f0', '<f8'), ('f1', '<f8'), ('f2', '<f8'), ('f3', 'S7')])
In [470]: x1, x2, x3 = data['f0'],data['f1'],data['f2']
In [471]: x4 = data['f3']
In [472]: x4
Out[472]:
array([b'str', b'astring'],
dtype='|S7')
The safest way to use genfromtxt
is
data = np.genfromtxt(...)
print(data.shape)
print(data.dtype)
and then make sure you understand that shape and dtype before moving on to using the data
array.
Upvotes: 2