lmm_5000
lmm_5000

Reputation: 149

Structured numpy array how to assign type correctly

My attempts to create structured arrays are failing:

I create an array shape (4,5)

sample = np.array([[0.01627555, 1.55885081, 1.99043222, 0.00898849, 1.43987417],
       [0.01875182, 0.97853587, 2.09924081, 0.00474326, 1.31002428],
       [0.01905054, 1.74849054, 1.78033106, 0.01303594, 1.28518933],
       [0.01753927, 1.22486495, 1.88287677, 0.01823483, 1.36472148]])

assign dtype for each of the five columns:

sample.dtype = [('X', 'f4'), ('Y', 'f4'), ('Z', 'f4'), ('f', 'f4'), ('g', 'f4' )]

I would expect:

sample['X']
> array([0.01627555, 0.01875182, 0.01905054, 0.01753927], dtype=float32)

However, what I'm doing gives

array([[-1.6328180e-12,  1.9988040e+00],
       [ 7.9082486e+13,  2.0124049e+00],
       [ 7.7365790e-24,  1.9725413e+00],
       [ 3.6306835e+36,  1.9853595e+00]], dtype=float32)

I know how to do this in pandas, etc, but I need the structured array in this occasion. What am I doing wrong?

Upvotes: 2

Views: 144

Answers (2)

hpaulj
hpaulj

Reputation: 231335

The main structured array documentation page introduces recfunctions:

https://numpy.org/doc/stable/user/basics.rec.html#module-numpy.lib.recfunctions

In [2]: import numpy.lib.recfunctions as rf

Define the dtype object:

In [11]: dt = np.dtype([('X', 'f4'), ('Y', 'f4'), ('Z', 'f4'), ('f', 'f4'), ('g', 'f4' )])                              

rf has a function designed to perform this kind of conversion:

In [13]: arr = rf.unstructured_to_structured(sample, dtype=dt)                                                          
In [14]: arr                                                                                                            
Out[14]:                                                                                                                
    array([(0.01627555, 1.5588508, 1.9904323, 0.00898849, 1.4398742),                                                              
    (0.01875182, 0.9785359, 2.0992408, 0.00474326, 1.3100243),                                                              
    (0.01905054, 1.7484906, 1.780331 , 0.01303594, 1.2851893),                                                              
    (0.01753927, 1.224865 , 1.8828768, 0.01823483, 1.3647215)],                                                            
    dtype=[('X', '<f4'), ('Y', '<f4'), ('Z', '<f4'), ('f', '<f4'), ('g', '<f4')])                                     

You can also create a structured array with a list of tuples, which will look a lot like the above displays of the array.

 In [16]: np.array([tuple(x) for x in sample], dtype=dt)   

The other answer uses a records function to make the array from a list of arrays, one per field:

 np.core.records.fromarrays(list(sample.T), dtype=dt)                                                              

Like many rf functions, this function creates a "blank" array with the desired dtype and shape, and copies values to it by field name:

In [31]: dt.names                                                                                                       
Out[31]: ('X', 'Y', 'Z', 'f', 'g')                                                                                      
In [32]: res = np.zeros(4, dtype=dt)                                                                                    
In [33]: for i,name in enumerate(dt.names):                                                                                 
    ...:     res[name] = sample[:,i] 

Upvotes: 1

Corralien
Corralien

Reputation: 120391

You have to reformat your array with np.core.records.array to a list of tuples to convert it as recarray

dtype = [('X', 'f4'), ('Y', 'f4'), ('Z', 'f4'), ('f', 'f4'), ('g', 'f4' )]

arr = np.core.records.array(tuple(sample.T), dtype=dtype)

Output:

>>> arr  # note the recarray
rec.array([(0.01627555, 1.5588508, 1.9904323, 0.00898849, 1.4398742),
           (0.01875182, 0.9785359, 2.0992408, 0.00474326, 1.3100243),
           (0.01905054, 1.7484906, 1.780331 , 0.01303594, 1.2851893),
           (0.01753927, 1.224865 , 1.8828768, 0.01823483, 1.3647215)],
          dtype=[('X', '<f4'), ('Y', '<f4'), ('Z', '<f4'), ('f', '<f4'), ('g', '<f4')])

>>> arr['X']  # note you still have ndarray
array([0.01627555, 0.01875182, 0.01905054, 0.01753927], dtype=float32)

Upvotes: 2

Related Questions