user1121588
user1121588

Reputation:

Methods of creating a structured array

I have the following information and I can produce a numpy array of the desired structure. Note that the values x and y have to be determined separately since their ranges may differ so I cannot use:

xy = np.random.random_integers(0,10,size=(N,2))

The extra list[... conversion is necessary for the conversion in order for it to work in Python 3.4, it is not necessary, but not harmful when using Python 2.7.

The following works:

>>> # attempts to formulate [id,(x,y)] with specified dtype 
>>> N = 10
>>> x = np.random.random_integers(0,10,size=N)
>>> y = np.random.random_integers(0,10,size=N)
>>> id = np.arange(N)
>>> dt = np.dtype([('ID','<i4'),('Shape',('<f8',(2,)))])
>>> arr = np.array(list(zip(id,np.hstack((x,y)))),dt)
>>> arr
    array([(0, [7.0, 7.0]), (1, [7.0, 7.0]), (2, [5.0, 5.0]), (3, [0.0, 0.0]),
           (4, [6.0, 6.0]), (5, [6.0, 6.0]), (6, [7.0, 7.0]),
           (7, [10.0, 10.0]), (8, [3.0, 3.0]), (9, [7.0, 7.0])], 
          dtype=[('ID', '<i4'), ('Shape', '<f8', (2,))])

I cleverly thought I could circumvent the above nasty bits by simply creating the array in the desired vertical structure and applying my dtype to it, hoping that it would work. The stacked array is correct in the vertical form

>>> a = np.vstack((id,x,y)).T
>>> a
array([[ 0,  7,  6],
       [ 1,  7,  7],
       [ 2,  5,  9],
       [ 3,  0,  1],    
       [ 4,  6,  1],
       [ 5,  6,  6],
       [ 6,  7,  6],
       [ 7, 10,  9],
       [ 8,  3,  2],
       [ 9,  7,  8]])

I tried several ways of trying to reformulate the above array so that my dtype would work and I just can't figure it out (this included vstacking a vstack etc). So my question is...how can I use the vstack version and get it into a format that meets my dtype requirements without having to go through the procedure that I did. I am hoping it is obvious, but I am sliced, stacked and ellipsed myself into an endless loop.

SUMMARY

Many thanks to hpaulj. I have included two incarnations based upon his suggestions for others to consider. The pure numpy solution is substantially faster and a lot cleaner.

"""
Script:  pnts_StackExch
Author:  [email protected]
Modified: 2015-08-24
Purpose: 
    To provide some timing options on point creation in preparation for
    point-to-point distance calculations using einsum.
Reference:
    http://stackoverflow.com/questions/32224220/
    methods-of-creating-a-structured-array
Functions:
    decorators:  profile_func, timing, arg_deco
    main:  make_pnts, einsum_0
"""
import numpy as np
import random
import time
from functools import wraps

np.set_printoptions(edgeitems=5,linewidth=75,precision=2,suppress=True,threshold=5)

# .... wrapper funcs .............
def delta_time(func):
    """timing decorator function"""
    import time
    @wraps(func)
    def wrapper(*args, **kwargs):
        print("\nTiming function for... {}".format(func.__name__))
        t0 = time.time()                # start time
        result = func(*args, **kwargs)  # ... run the function ...
        t1 = time.time()                # end time
        print("Results for... {}".format(func.__name__))
        print("  time taken ...{:12.9f} sec.".format(t1-t0))
        #print("\n  print results inside wrapper or use <return> ... ")
        return result                   # return the result of the function
    return wrapper

def arg_deco(func):
    """This wrapper just prints some basic function information."""
    @wraps(func)
    def wrapper(*args,**kwargs):
        print("Function... {}".format(func.__name__))
        #print("File....... {}".format(func.__code__.co_filename))
        print("  args.... {}\n  kwargs. {}".format(args,kwargs))
        #print("  docs.... {}\n".format(func.__doc__))
        return func(*args, **kwargs)
    return wrapper

# .... main funcs ................
@delta_time
@arg_deco
def pnts_IdShape(N=1000000,x_min=0,x_max=10,y_min=0,y_max=10):
    """Make N points based upon a random normal distribution,
       with optional min/max values for Xs and Ys
    """
    dt = np.dtype([('ID','<i4'),('Shape',('<f8',(2,)))]) 
    IDs = np.arange(0,N)
    Xs = np.random.random_integers(x_min,x_max,size=N) # note below
    Ys = np.random.random_integers(y_min,y_max,size=N)
    a = np.array([(i,j) for i,j in zip(IDs,np.column_stack((Xs,Ys)))],dt)
    return IDs,Xs,Ys,a

@delta_time
@arg_deco
def alternate(N=1000000,x_min=0,x_max=10,y_min=0,y_max=10):
    """ after hpaulj and his mods to the above and this.  See docs
    """
    dt = np.dtype([('ID','<i4'),('Shape',('<f8',(2,)))])
    IDs = np.arange(0,N)
    Xs = np.random.random_integers(0,10,size=N)
    Ys = np.random.random_integers(0,10,size=N)   
    c_stack = np.column_stack((IDs,Xs,Ys))
    a = np.ones(N, dtype=dt)
    a['ID'] = c_stack[:,0]
    a['Shape'] = c_stack[:,1:]
    return IDs,Xs,Ys,a

if __name__=="__main__":
    """time testing for various methods
    """
    id_1,xs_1,ys_1,a_1 = pnts_IdShape(N=1000000,x_min=0, x_max=10, y_min=0, y_max=10)
    id_2,xs_2,ys_2,a_2 = alternate(N=1000000,x_min=0, x_max=10, y_min=0, y_max=10) 

Timing results for 1,000,000 points are as follows

Timing function for... pnts_IdShape
Function... **pnts_IdShape**
  args.... ()
  kwargs. {'N': 1000000, 'y_max': 10, 'x_min': 0, 'x_max': 10, 'y_min': 0}
Results for... pnts_IdShape
  time taken ... **0.680652857 sec**.

Timing function for... **alternate**
Function... alternate
  args.... ()
  kwargs. {'N': 1000000, 'y_max': 10, 'x_min': 0, 'x_max': 10, 'y_min': 0}
Results for... alternate
  time taken ... **0.060056925 sec**.

Upvotes: 2

Views: 180

Answers (1)

hpaulj
hpaulj

Reputation: 231570

There are 2 ways of filling a structured array (http://docs.scipy.org/doc/numpy/user/basics.rec.html#filling-structured-arrays) - by row (or rows with list of tuples), and by field.

To do this by field, create the empty structured array, and assign values by field name

In [19]: a=np.column_stack((id,x,y))
# same as your vstack().T

In [20]: Y=np.zeros(a.shape[0], dtype=dt)
# empty, ones, etc
In [21]: Y['ID'] = a[:,0]
In [22]: Y['Shape'] = a[:,1:]
# (2,) field takes a 2 column array
In [23]: Y
Out[23]: 
array([(0, [8.0, 8.0]), (1, [8.0, 0.0]), (2, [6.0, 2.0]), (3, [8.0, 8.0]),
       (4, [3.0, 2.0]), (5, [6.0, 1.0]), (6, [5.0, 6.0]), (7, [7.0, 7.0]),
       (8, [6.0, 1.0]), (9, [6.0, 6.0])], 
      dtype=[('ID', '<i4'), ('Shape', '<f8', (2,))])

On the surface

arr = np.array(list(zip(id,np.hstack((x,y)))),dt)

looks like an ok way of constructing the list of tuples need to fill the array. But result duplicates the values of x instead of using y. I'll have to look at what is wrong.

You can take a view of an array like a if the dtype is compatible - the data buffer for 3 int columns is layed out the same way as one with 3 int fields.

a.view('i4,i4,i4')

But your dtype wants 'i4,f8,f8', a mix of 4 and 8 byte fields, and a mix of int and float. The a buffer will have to be transformed to achieve that. view can't do it. (don't even ask about .astype.)


corrected list of tuples method:

In [35]: np.array([(i,j) for i,j in zip(id,np.column_stack((x,y)))],dt)
Out[35]: 
array([(0, [8.0, 8.0]), (1, [8.0, 0.0]), (2, [6.0, 2.0]), (3, [8.0, 8.0]),
       (4, [3.0, 2.0]), (5, [6.0, 1.0]), (6, [5.0, 6.0]), (7, [7.0, 7.0]),
       (8, [6.0, 1.0]), (9, [6.0, 6.0])], 
      dtype=[('ID', '<i4'), ('Shape', '<f8', (2,))])

The list comprehension produces a list like:

[(0, array([8, 8])),
 (1, array([8, 0])),
 (2, array([6, 2])),
 ....]

For each tuple in the list, the [0] goes in the first field of the dtype, and [1] (a small array), goes in the 2nd.

The tuples could also be constructed with

[(i,[j,k]) for i,j,k in zip(id,x,y)]

dt1 = np.dtype([('ID','<i4'),('Shape',('<i4',(2,)))])

is a view compatible dtype (still 3 integers)

In [42]: a.view(dtype=dt1)
Out[42]: 
array([[(0, [8, 8])],
       [(1, [8, 0])],
       [(2, [6, 2])],
       [(3, [8, 8])],
       [(4, [3, 2])],
       [(5, [6, 1])],
       [(6, [5, 6])],
       [(7, [7, 7])],
       [(8, [6, 1])],
       [(9, [6, 6])]], 
      dtype=[('ID', '<i4'), ('Shape', '<i4', (2,))])

Upvotes: 1

Related Questions