Reputation: 373
As seen here:
http://library.isr.ist.utl.pt/docs/numpy/user/basics.io.genfromtxt.html#choosing-the-data-type
"In all the cases but the first one, the output will be a 1D array with a structured dtype. This dtype has as many fields as items in the sequence. The field names are defined with the names keyword."
The problem is how do I get around this? I want to use genfromtxt with a data file with columns that are, e.g. int, string, int.
If I do:
dtype=(int, "|S5|", int)
Then the entire shape changes from (x, y) to merely (x, ) and I get 'too many indices' errors when I try to use masks.
When I use dtype=None I get to keep the 2D structure, but it often makes mistakes if the 1st row the column looks like it could be a number (this often occurs in my data set).
How am I best to get around this?
Upvotes: 1
Views: 1461
Reputation: 65851
You cannot have a 2D array, it would mean having 1D arrays with mixed dtype for each row, which is not possible.
Having an array of records shouldn't be a problem:
In [1]: import numpy as np
In [2]: !cat test.txt
42 foo 41
40 bar 39
In [3]: data = np.genfromtxt('test.txt',
..: dtype=np.dtype([('f1', int), ('f2', np.str_, 5), ('f3', int)]))
In [4]: data
Out[4]:
array([(42, 'foo', 41), (40, 'bar', 39)],
dtype=[('f1', '<i8'), ('f2', '<U5'), ('f3', '<i8')])
In [5]: data['f3']
Out[5]: array([41, 39])
In [6]: data['f3'][1]
Out[6]: 39
If you need a masked array, look here: How can I mask elements of a record array in Numpy?
To mask by 1st column value:
In [7]: data['f1'] == 40
Out[7]: array([False, True], dtype=bool)
In [8]: data[data['f1'] == 40]
Out[8]:
array([(40, 'bar', 39)],
dtype=[('f1', '<i8'), ('f2', '<U5'), ('f3', '<i8')])
Upvotes: 1