Tristan
Tristan

Reputation: 43

Handling attributes of a class within a numpy array

I would like to handle class attributes without going through a Python for loop. To handle large arrays, numpy is the best/fastest but is it possible to access class attributes within a numpy array? Consider the following simplistic code:

import numpy as np

class MyClass():
    def __init__(self):
        self.myvar1 = 10
        self.myvar2 = 20

myarray1 = np.arange(0, 1000, 1)
myarray2 = np.array([MyClass() for i in range(1000)])

All the values of myarray1 would be easily modifiable through one line:

myarray1 += 5

But how can I access myvar1 of all of the MyClass instances in myarray2 and modify it in one go? (is it even possible?) I know that the following does not work but it gives the idea of what I want to achieve:

myarray2.myvar1 += 5
myarray2[myarray2.myvar1] += 5

I have been looking around a lot to find a solution and the closest thing I could find is numpy's recarray that can kind of mimic Python classes, but it does not seem to be a solution for me as the class I am using is a subclass (a pyglet Sprite to be exact) so I do need to use a Python class.

Edit

Following up on hpaulj comment, I am trying to use a vectorized function of the class to update its attribute. Is it an efficient way of updating all the instances of the class?

class MyClass():
    def __init__(self):
        self.myvar1 = 10
        self.myvar2 = 20
    def modifyvar(self):
        self.myvar1 += 5
        return self

vecfunc = np.vectorize(MyClass.modifyvar)
myarray2 = np.array([MyClass() for i in range(1000)])
myarray2 = vecfunc(myarray2)

However, another problem arises: when use this code, myarray2[0].myvar1 returns 20 instead of 15! myarray2[1].myvar1 does return 15, same goes for the rest of the array. Why is myarray2[0] different here?


Solution

Vectorizing a function of the class allows handling the attribute of several of its instances without a for loop. The code of the solution:

class MyClass():
    def __init__(self):
        self.myvar1 = 10
        self.myvar2 = 20
    def modifyvar(self):
        self.myvar1 += 5
        return self

vecfunc = np.vectorize(MyClass.modifyvar, otypes=[object])
myarray2 = np.array([MyClass() for i in range(1000)])
vecfunc(myarray2)

Note: add otype=[object] when using vectorize and dealing with objects.

Upvotes: 4

Views: 2522

Answers (1)

hpaulj
hpaulj

Reputation: 231615

The extra application of modifyvar to the 1st element results from vectorize trying to determine the type of array to return. Specifying the otypes gets around that problem:

vecfunc = np.vectorize(MyClass.modifyvar,otypes=[object])

With this 'inplace' modifier, you don't need to pay attention to what is returned:

vecfunc(myarray2)

is sufficient.

From the vectorize documentation:

The data type of the output of vectorized is determined by calling the function with the first element of the input. This can be avoided by specifying the otypes argument.

If you defined an add5 method like:

    def add5(self):
        self.myvar1 += 5
        return self.myvar1

then

vecfunc = np.vectorize(MyClass.add5,otypes=[int])
vecfunc(myarray2)

would return a numeric array, and modify myarray2 at the same time:

array([15, 15, 15, 15, 15, 15, 15, 15, 15, 15])

to display the values I use:

[x.myvar1 for x in myarray2]

I really should define a vectorized 'print'.

This looks like one of the better applications of vectorize. It doesn't give you any compiled speed, but it does let you use the array notation and broadcasting while operating on your instances one by one. For example vecfunc(myarray2.reshape(2,5)) returns a (2,5) array of values.

Upvotes: 2

Related Questions