xinwei he
xinwei he

Reputation: 31

How to extract values of numpy array of a customized type more efficiently?

Suppose I have defined a datatype, as below:

class mytype(object):
    def __init__(self, x=1, y=2, z=3):
        self.x = x
        self.y = y 
        self.z = z 

And I have an numpy array of type mytype, which is defined as:

my_array = np.array([mytype()]*1000)

And my question is: how to extract the values of the numpy array defined above and set it to an numpy array of type np.float64 more efficiently? I have found using list comprehension is very slow when the array is large, and I guess there must be some good way to do this job. Can anyone help me out

Upvotes: 1

Views: 459

Answers (2)

aminrd
aminrd

Reputation: 4990

Based on Numpy documentation here, numpy.array calls the __array__ method of an object. So, you are able to define any arbitrary conversion to a numpy.array like:

class mytype(object):
    def __init__(self, x=1, y=2, z=3):
        self.x = x
        self.y = y 
        self.z = z 

    def __array__(self):
        return np.array([self.x, self.y, self.z])

Then you are able to convert a single mytype() object to a np.array by:

tmp = mytype()
np.array(tmp)
# array([1, 2, 3])

Now, when you have a list of 1000 objects, you can map np.array to all of them:

new_list = list(map(np.array, [mytype()]*1000))
#[array([1, 2, 3]), array([1, 2, 3]), array([1, 2, 3]), array([1, 2, 3]), ...

Upvotes: 1

Bobby Ocean
Bobby Ocean

Reputation: 3314

Numpy is fast because it is nearly pure C code that is running computations on C arrays. As C arrays things need to be neat and clean; like how much space we use? what is the size of the objects for that space?, and how many objects do we have? etc. When you create a collection of arbitrary python objects (which can have dynamic size) and then want to take that collection of objects and place it into a numpy array, then each object will need to be found and converted, and there isn't really anyway around that.

my_array = np.array([mytype() for _ in range(1000)])

This is basically 1000 pointers to arbitrary objects. Numpy knows nothing about those objects except where to ask python for more information about those objects. As such, the above array has no C code to speed up the process. It is nearly equivalent to a list:

my_array = [mytype() for _ in range(1000)]

If you want to make your code faster, you shouldn't make numpy array's with arbitrary objects. Likewise, you shouldn't use python integers (which can be any size and have a lot of overhead) when you really want float64. For example, your class could be updated:

class mytype(object):
    def __init__(self, x=1, y=2, z=3):
        self.data = np.array([x,y,z],dtype='float64')

At least now each self.data could be accessed and hstacked, and since numpy knows the exact size and shape of each object, then numpy could probably gather up all the 1000 places in memory and copy them into a new array quite quickly.

Upvotes: 1

Related Questions