Reputation: 65
I have a file whose lines consist of 2 integers and one float. I read the file with numpy:
dt = np.dtype([('pre', np.dtype('i4'), 2),('data', np.float64, 1)])
a = np.fromfile("myfile", dtype=dt)
array([([65536, 65536], 0.2 ), ([65536, 1], 1.33566434),
([65536, 2], 2.06068931), ..., ([65535, 479], 0.33333333),
([65535, 2295], 0.09090909), ([65535, 249], 0.07692308)],
dtype=[('pre', '<i4', (2,)), ('data', '<f8')])
I actually have two questions: When I iterate a with np.nditer I can't access a[0][0][0] for example Why is that and how to use np.nditer ? Second question: How can I sort the elements after the first entry in the ['pre'] list and then after the second entry in ['pre'] The wanted output would look like:
array([([1, 1], 0.2 ), ([1, 2], 1.33566434),
([1, 3], 2.06068931), ..., ([2, 1], 0.33333333),
([2, 2], 0.09090909), ([2, 3], 0.07692308)],
dtype=[('pre', '<i4', (2,)), ('data', '<f8')])
Any suggestions are welcome, even if changing the data type for reading the file would help. Performance is needed as well because the file I have is very large. Thanks
Upvotes: 0
Views: 61
Reputation: 231395
You have a 1d structured array:
In [56]: arr = np.array([([65536, 65536], 0.2 ), ([65536, 1], 1.3356
...: 6434),
...: ([65536, 2], 2.06068931), ([65535, 479], 0.33333333),
...: ([65535, 2295], 0.09090909), ([65535, 249], 0.07692308)],
...: dtype=[('pre', '<i4', (2,)), ('data', '<f8')])
...:
In [57]: arr
Out[57]:
array([([65536, 65536], 0.2 ), ([65536, 1], 1.33566434),
([65536, 2], 2.06068931), ([65535, 479], 0.33333333),
([65535, 2295], 0.09090909), ([65535, 249], 0.07692308)],
dtype=[('pre', '<i4', (2,)), ('data', '<f8')])
In [58]: arr.shape
Out[58]: (6,)
In [59]: arr.dtype
Out[59]: dtype([('pre', '<i4', (2,)), ('data', '<f8')])
In [60]: arr['pre']
Out[60]:
array([[65536, 65536],
[65536, 1],
[65536, 2],
[65535, 479],
[65535, 2295],
[65535, 249]], dtype=int32)
In [61]: arr['data']
Out[61]:
array([0.2 , 1.33566434, 2.06068931, 0.33333333, 0.09090909,
0.07692308])
It has 2 fields. The pre
field has 2 elements, so the arr['pre']
is a 2d numeric array.
As a general rule you don't need to use nditer
to iterate through an array. It's useful when developing cython
code, but isn't needed in Python code.
If you use nditer
you get a () shape array with the original dtype:
In [70]: for x in np.nditer(arr):
...: print(x)
([65536, 65536], 0.2)
([65536, 1], 1.33566434)
([65536, 2], 2.06068931)
([65535, 479], 0.33333333)
([65535, 2295], 0.09090909)
([65535, 249], 0.07692308)
The difference between that direct iteration is subtle. The type
in the nditer
case is <class 'numpy.ndarray'>
. In the direct iteration case <class 'numpy.void'>
.
As for the sorting, it sounds like you want np.lexsort
using the 2 columns of the 'pre' field:
In [76]: np.lexsort((arr['pre'][:,1], arr['pre'][:,0]))
Out[76]: array([5, 3, 4, 1, 2, 0])
In [77]: arr[_]
Out[77]:
array([([65535, 249], 0.07692308), ([65535, 479], 0.33333333),
([65535, 2295], 0.09090909), ([65536, 1], 1.33566434),
([65536, 2], 2.06068931), ([65536, 65536], 0.2 )],
dtype=[('pre', '<i4', (2,)), ('data', '<f8')])
A similar lexsort
was just recommended for numpy sort 2d: rearrange rows without changing values in row
Upvotes: 1