Dan
Dan

Reputation: 13

ufunc (min, max, mean, etc) on structured (record) arrays with different dtype

I am working in Python(3.8) with numpy(1.20.3) and trying to perform simple functions on a structured array having different data types.

def test_large_record():
    x = numpy.array([0.0, 0.2, 0.3], dtype=numpy.float)
    x_2 = numpy.array([0.01, 0.12, 0.82], dtype=numpy.float)
    y = numpy.array([1, 5, 7], dtype=numpy.int)
    rec_array = numpy.rec.fromarrays([x, x_2, y], dtype=[('x', '<f8'), ('x_2', '<f8'), ('y', '<i8')])

    print(rec_array.min())

This results in a "TypeError: cannot perform reduce with flexible type".

I tried to create something that would then go through a generic structured array and return a generated view of each field array having the same data type.... but that doesn't seem to work.

def rec_homogeneous_generator(rec_array):
    dtype = {}

    for name, dt in rec_array.dtype.descr:
        if dt not in dtype.keys():
            dtype[dt] = []

        dtype[dt].append(name)

    for dt, cols in dtype.items():
        r = rec_array[cols]
        v = r.view(dt)
        yield v


def test_large_record():
    x = numpy.array([0.0, 0.2, 0.3], dtype=numpy.float)
    x_2 = numpy.array([0.01, 0.12, 0.82], dtype=numpy.float)
    y = numpy.array([1, 5, 7], dtype=numpy.int)
    rec_array = numpy.rec.fromarrays([x, x_2, y], dtype=[('x', '<f8'), ('x_2', '<f8'), ('y', '<i8')])

    for h_array in rec_homogeneous_generator(rec_array):
        print(h_array.min(axis=0))

This results in 0.0 and 0 which is not what I expected. I should get [0, 0.01] and 1.

Anyone have any good ideas?

Upvotes: 0

Views: 150

Answers (1)

hpaulj
hpaulj

Reputation: 231335

Operating on one field at a time:

In [21]: [rec_array[field].min() for field in rec_array.dtype.fields]
Out[21]: [0.0, 0.01, 1]

With your multi-field indexing in a recent numpy version

In [23]: list(rec_homogeneous_generator(rec_array))
Out[23]: 
[rec.array([0.0e+000, 1.0e-002, 4.9e-324, 2.0e-001, 1.2e-001, 2.5e-323,
            3.0e-001, 8.2e-001, 3.5e-323],
           dtype=float64),
 rec.array([                  0, 4576918229304087675,                   1,
            4596373779694328218, 4593311331947716280,                   5,
            4599075939470750515, 4605561122934164029,                   7],
           dtype=int64)]

Multi-field indexing:

In [25]: rec_array[['x','x_2']]
Out[25]: 
rec.array([(0. , 0.01), (0.2, 0.12), (0.3, 0.82)],
          dtype={'names':['x','x_2'], 'formats':['<f8','<f8'], 'offsets':[0,8], 'itemsize':24})

Better handling of multi-field indexing:

In [26]: import numpy.lib.recfunctions as rf
In [28]: rf.repack_fields(rec_array[['x','x_2']])
Out[28]: 
rec.array([(0. , 0.01), (0.2, 0.12), (0.3, 0.82)],
          dtype=[('x', '<f8'), ('x_2', '<f8')])

Now we can change to float:

In [29]: rf.repack_fields(rec_array[['x','x_2']]).view(float)
Out[29]: 
rec.array([0.  , 0.01, 0.2 , 0.12, 0.3 , 0.82],
          dtype=float64)

This view is 1d.

or better yet:

In [30]: rf.structured_to_unstructured(rec_array[['x','x_2']])
Out[30]: 
rec.array([[0.  , 0.01],
           [0.2 , 0.12],
           [0.3 , 0.82]],
          dtype=float64)

These functions are documented on the structured array page.

Upvotes: 0

Related Questions