QuantumPanda
QuantumPanda

Reputation: 303

genfromtxt only imports first column, after changing dtype

I am importing huge data sets with various types of data, using genfromtxt.
My original code worked fine (ucols is the list of columns I want to load):

data = np.genfromtxt(fname,comments = '#', skip_header=1, usecols=(ucols))

Some of my values are strings, so to avoid getting entries of NaN I tried setting dtype = None :

        data = np.genfromtxt(fname, dtype = None,comments = '#', skip_header=1, usecols=(ucols)) 

Now for some reason I only get one column of data, IE the first column. Can someone explain what I am doing wrong?

EDIT: I now understand I am supposed to obtain a 1D structured array that can be referenced to get a whole row of values. However I wish to have my data as a numpy array, is it possible to use genfromtxt with dtype = None and still obtain a numpy array instead of a structured array, or alternatively is there a quick way to convert between the two. Although the second method is not preferable unless it can be quick and efficient since I am moving much larger values than this current instance usually.

Upvotes: 0

Views: 544

Answers (1)

hpaulj
hpaulj

Reputation: 231385

Make a structured array and write it to csv:

In [131]: arr=np.ones((3,), dtype='i,f,U10,i,f')
In [132]: arr['f2']=['a','bc','def']
In [133]: arr
Out[133]: 
array([(1, 1., 'a', 1, 1.), (1, 1., 'bc', 1, 1.), (1, 1., 'def', 1, 1.)],
      dtype=[('f0', '<i4'), ('f1', '<f4'), ('f2', '<U10'), ('f3', '<i4'), ('f4', '<f4')])
In [134]: np.savetxt('test',arr,fmt='%d,%e,%s,%d,%f')
In [135]: cat test
1,1.000000e+00,a,1,1.000000
1,1.000000e+00,bc,1,1.000000
1,1.000000e+00,def,1,1.000000

load all columns with dtype=None:

In [137]: np.genfromtxt('test',delimiter=',',dtype=None,encoding=None)
Out[137]: 
array([(1, 1., 'a', 1, 1.), (1, 1., 'bc', 1, 1.), (1, 1., 'def', 1, 1.)],
      dtype=[('f0', '<i8'), ('f1', '<f8'), ('f2', '<U3'), ('f3', '<i8'), ('f4', '<f8')])

load a subset of the columns:

In [138]: np.genfromtxt('test',delimiter=',',dtype=None,encoding=None,usecols=
     ...: (1,2,4))
Out[138]: 
array([(1., 'a', 1.), (1., 'bc', 1.), (1., 'def', 1.)],
      dtype=[('f0', '<f8'), ('f1', '<U3'), ('f2', '<f8')])

Upvotes: 1

Related Questions