how to sort complex structured data with numpy?

Question

I have a file whose lines consist of 2 integers and one float. I read the file with numpy:

dt = np.dtype([('pre', np.dtype('i4'), 2),('data', np.float64, 1)])
a = np.fromfile("myfile", dtype=dt)

array([([65536, 65536], 0.2       ), ([65536,     1], 1.33566434),
       ([65536,     2], 2.06068931), ..., ([65535,   479], 0.33333333),
       ([65535,  2295], 0.09090909), ([65535,   249], 0.07692308)],
      dtype=[('pre', '




I actually have two questions:
When I iterate a with np.nditer I can't access a[0][0][0] for example
Why is that and how to use np.nditer ?
Second question: How can I sort the elements after the first entry in the ['pre'] list and then after the second entry in ['pre']
The wanted output would look like:



array([([1, 1], 0.2       ), ([1,     2], 1.33566434),
       ([1,     3], 2.06068931), ..., ([2,   1], 0.33333333),
       ([2,  2], 0.09090909), ([2,   3], 0.07692308)],
      dtype=[('pre', '





Any suggestions are welcome, even if changing the data type for reading the file would help. Performance is needed as well because the file I have is very large.
Thanks

hpaulj · Accepted Answer

You have a 1d structured array:

In [56]: arr = np.array([([65536, 65536], 0.2       ), ([65536,     1], 1.3356
    ...: 6434),
    ...:        ([65536,     2], 2.06068931), ([65535,   479], 0.33333333),
    ...:        ([65535,  2295], 0.09090909), ([65535,   249], 0.07692308)],
    ...:       dtype=[('pre', '



It has 2 fields.  The pre field has 2 elements, so the arr['pre'] is a 2d numeric array.

As a general rule you don't need to use nditer to iterate through an array.  It's useful when developing cython code, but isn't needed in Python code.

If you use nditer you get a () shape array with the original dtype:

In [70]: for x in np.nditer(arr):
    ...:     print(x)

([65536, 65536], 0.2)
([65536,     1], 1.33566434)
([65536,     2], 2.06068931)
([65535,   479], 0.33333333)
([65535,  2295], 0.09090909)
([65535,   249], 0.07692308)


The difference between that direct iteration is subtle.  The type in the nditer case is . In the direct iteration case .

As for the sorting, it sounds like you want np.lexsort using the 2 columns of the 'pre' field:

In [76]: np.lexsort((arr['pre'][:,1], arr['pre'][:,0]))
Out[76]: array([5, 3, 4, 1, 2, 0])
In [77]: arr[_]
Out[77]: 
array([([65535,   249], 0.07692308), ([65535,   479], 0.33333333),
       ([65535,  2295], 0.09090909), ([65536,     1], 1.33566434),
       ([65536,     2], 2.06068931), ([65536, 65536], 0.2       )],
      dtype=[('pre', '


A similar lexsort was just recommended for numpy sort 2d: rearrange rows without changing values in row

how to sort complex structured data with numpy?

Answers (1)

Related Questions