a11
a11

Reputation: 3396

How to create nested rec arrays

Given the following arrays:

name = np.array(['a', 'b', 'c'])
val = np.array([0.4, 0.5, 0.6])
alt = np.array([1.1, 2.1, 3.1])
b = np.array([17.2])

How can I combine them into a recarray (or structured array, same thing) that looks like this: [('a', 'b', 'c'), (0.4, 0.5, 0.6), (1.1, 2.1, 3.1), (17.2)]. And where print(arr["name"]) returns ('a', 'b', 'c').

The actual data has a dozen arrays. There is always one array (b) that only has size of one; the others all have the same size, but that size will vary. So, I'm looking for a solution that is extensible to these conditions. Thank you.

Upvotes: 1

Views: 74

Answers (2)

hpaulj
hpaulj

Reputation: 231335

Define a dtype:

In [41]: dt = np.dtype([('name','U10'),('val','f'),('alt','f'),('b','f')])

make a zeros array of the desired shape and dtype:

In [43]: arr = np.zeros(3, dt)

Copy the arrays to their respective fields:

In [44]: arr['name']=name; arr['val']=val; arr['alt']=alt    
In [45]: arr['b']=b

And the result:

In [46]: arr
Out[46]: 
array([('a', 0.4, 1.1, 17.2), ('b', 0.5, 2.1, 17.2),
       ('c', 0.6, 3.1, 17.2)],
      dtype=[('name', '<U10'), ('val', '<f4'), ('alt', '<f4'), ('b', '<f4')])

That looks different from what you want, but it is a valid structured array. Yours isn't. And access by field name does what you want:

In [47]: arr['name']
Out[47]: array(['a', 'b', 'c'], dtype='<U10')

The b values have been replicated. You can't make a "ragged" structured array:

In [48]: arr['b']
Out[48]: array([17.2, 17.2, 17.2], dtype=float32)

The other answer creates a dict, which gives the same "key" result, but is a distinct structure. But it may be what you really want.

There are some helper functions that create a recarray from a set of arrays, but their action amounts to the same thing. And they (probably) won't work directly with the single element b.

You could make the list of tuples with:

In [53]: from itertools import zip_longest
In [54]: [ijk for ijk in zip_longest(name,val,alt,b)]
Out[54]: [('a', 0.4, 1.1, 17.2), ('b', 0.5, 2.1, None), ('c', 0.6, 3.1, None)]
In [55]: np.array(_, dt)
Out[55]: 
array([('a', 0.4, 1.1, 17.2), ('b', 0.5, 2.1,  nan),
       ('c', 0.6, 3.1,  nan)],
      dtype=[('name', '<U10'), ('val', '<f4'), ('alt', '<f4'), ('b', '<f4')])

Though the b fill of None/nan may not be what you want.

You could combine the arrays into one object dtype array, but the elements are not accessible by name. That requires a dict:

In [64]: barr = np.array([name, val, alt, b], dtype=object)
In [65]: barr
Out[65]: 
array([array(['a', 'b', 'c'], dtype='<U1'), array([0.4, 0.5, 0.6]),
       array([1.1, 2.1, 3.1]), array([17.2])], dtype=object)

Upvotes: 2

Lover of Structure
Lover of Structure

Reputation: 1848

The following solution produces output that closely matches what you say you desire (but it's not a NumPy record array):

import numpy as np

name = np.array(['a', 'b', 'c'])
val = np.array([0.4, 0.5, 0.6])
alt = np.array([1.1, 2.1, 3.1])
b = np.array([17.2])

arr = {}
for var in ['name', 'val', 'alt', 'b']:
    arr[var] = eval(var)

print(arr["name"])

This prints ['a' 'b' 'c']. Note that arr here is a simple dictionary.


An alternative answer using NumPy's numpy.recarray would be the following:

import numpy as np

# initialization
name = np.array(['a', 'b', 'c'])
val = np.array([0.4, 0.5, 0.6])
alt = np.array([1.1, 2.1, 3.1])
b = np.array([17.2])

# processing
b = np.array([b[0]] * len(name))  # make b longer
fields = ['name', 'val', 'alt', 'b']
dt = np.dtype([('name', '<U12')] + list((colname, 'f8') for colname in fields[1:]))
arr = np.array(list(zip(name, val, alt, b)), dt)

print(arr["name"])  # output: ['a' 'b' 'c']

Here, arr evaluates to the following:

array([('a', 0.4, 1.1, 17.2), ('b', 0.5, 2.1, 17.2),
       ('c', 0.6, 3.1, 17.2)],
      dtype=[('name', '<U12'), ('val', '<f8'), ('alt', '<f8'), ('b', '<f8')])

Upvotes: 0

Related Questions