Jorge Canelhas
Jorge Canelhas

Reputation: 140

apply along axis numpy with different sizes of array

I'm trying to apply a funtion to all the rows of a numpy array, it works if the lists in the row have the same size, but fails whenever one has a different size.

The function to be applied

from math import *
import operator



def parseRPN(expression,roundtointeger=False):
    """Parses and calculates the result of a RPN expression
        takes a list in the form of ['2','2','*']
        returns 4
    """""

    def safe_divide(darg1, darg2):
        ERROR_VALUE = 1.
        # ORIGINAL ___ Here we can penalize asymptotes with the var PENALIZE_ASYMPITOTES

        try:
            return darg1 / darg2
        except ZeroDivisionError:
            return ERROR_VALUE

    function_twoargs = {'*': operator.mul, '/': safe_divide, '+': operator.add, '-': operator.sub}
    function_onearg = {'sin': sin, 'cos': cos}
    stack = []
    for val in expression:
        result = None
        if val in function_twoargs:
            arg2 = stack.pop()
            arg1 = stack.pop()
            result = function_twoargs[val](arg1, arg2)
        elif val in function_onearg:
            arg = stack.pop()
            result = function_onearg[val](arg)
        else:
            result = float(val)
        stack.append(result)

    if roundtointeger == True:
        result=stack.pop()
        result=round(result)
    else:
        result=stack.pop()
    return result

NOT OK

dat=np.array([['4','5','*','6','+','3','/'],['4','4','*','6','*'],['4','5','*','6','+'],['4','5','*','6','+']])
lout=np.apply_along_axis(parseRPN,0,dat)

print(dat)
print(lout)

OK

dat=np.array([['4','5','*','6','+'],['4','4','*','6','*'],['4','5','*','6','+'],['4','5','*','6','+']])
lout=np.apply_along_axis(parseRPN,0,dat)

print(dat)
print(lout)

Am I using the right tool for the job ? the idea here is to vectorize the computation os a series of lists.

Thanks

Upvotes: 0

Views: 1010

Answers (2)

hpaulj
hpaulj

Reputation: 231395

With a complex 'row' processing like this, you might as well treat the array as a list:

With equal length rows, dat is a 2d character array:

In [138]: dat=np.array([['4','5','*','6','+'],['4','4','*','6','*'],['4','5','*'
     ...: ,'6','+'],['4','5','*','6','+']])
In [139]: dat
Out[139]: 
array([['4', '5', '*', '6', '+'],
       ['4', '4', '*', '6', '*'],
       ['4', '5', '*', '6', '+'],
       ['4', '5', '*', '6', '+']],
      dtype='<U1')

With varying length, the array is 1d object type containing lists:

In [140]: dat1=np.array([['4','5','*','6','+','3','/'],['4','4','*','6','*'],['4
     ...: ','5','*','6','+'],['4','5','*','6','+']])
In [141]: dat1
Out[141]: 
array([list(['4', '5', '*', '6', '+', '3', '/']),
       list(['4', '4', '*', '6', '*']), 
       list(['4', '5', '*', '6', '+']),
       list(['4', '5', '*', '6', '+'])], dtype=object)

In either case, a simple row iteration works fine (map also works, but in Py3 you have to use list(map(...))).

In [142]: [parseRPN(row) for row in dat]
Out[142]: [26.0, 96.0, 26.0, 26.0]
In [143]: [parseRPN(row) for row in dat1]
Out[143]: [8.666666666666666, 96.0, 26.0, 26.0]

apply_along_axis also uses iteration like this. It's nice when the array is 3d or higher, but for row iteration on a 1 or 2d array it is overkill.

For an object array like dat1, frompyfunc might have a modest speed advantage:

In [144]: np.frompyfunc(parseRPN,1,1)(dat1)
Out[144]: array([8.666666666666666, 96.0, 26.0, 26.0], dtype=object)

np.vectorize is slower, but also works with the object array

In [145]: np.vectorize(parseRPN)(dat1)
Out[145]: array([  8.66666667,  96.        ,  26.        ,  26.        ])

But applying it to the 2d character array requires the use of its signature parameter, which is slower and trickier.

numpy doesn't help with this problem. This is really a list of lists problem:

In [148]: dat=[['4','5','*','6','+'],['4','4','*','6','*'],['4','5','*','6','+']
     ...: ,['4','5','*','6','+']]
In [149]: [parseRPN(row) for row in dat]
Out[149]: [26.0, 96.0, 26.0, 26.0]

Upvotes: 2

Kyle
Kyle

Reputation: 3308

Your code works fine if you just use map or a list comprehension.

map(parseRPN, dat)

I wouldn't worry about figuring out numpy's apply until you actually need to improve the performance.

Upvotes: 1

Related Questions