Paul Rougieux
Paul Rougieux

Reputation: 11419

Why is numpy.vectorize() changing the division output of a scalar function?

I'm obtaining a strange result when I vectorise a function with numpy.

import numpy as np
def scalar_function(x, y):
    """ A function that returns x*y if x<y and x/y otherwise
    """
    if x < y :
        out = x * y 
    else:
        out = x/y 
    return out

def vector_function(x, y):
    """
    Make it possible to accept vectors as input
    """
    v_scalar_function = np.vectorize(scalar_function)
    return v_scalar_function(x, y)

we do have

scalar_function(4,3)
# 1.3333333333333333

Why is the vectorized version giving this strange output?

vector_function(np.array([3,4]), np.array([4,3]))
[12  1]

While this call to the vectorized version works fine:

vector_function(np.array([4,4]), np.array([4,3]))
[1.         1.33333333]

Reading numpy.divide:

Notes The floor division operator // was added in Python 2.2 making // and / equivalent operators. The default floor division operation of / can be replaced by true division with from __future__ import division. In Python 3.0, // is the floor division operator and / the true division operator. The true_divide(x1, x2) function is equivalent to true division in Python.

Makes me think this might be a remaining issue related to python2? But I'm using python 3!

Upvotes: 1

Views: 357

Answers (2)

roganjosh
roganjosh

Reputation: 13185

The docs for numpy.vectorize state:

The output type is determined by evaluating the first element of the input, unless it is specified

Since you did not specify a return data type, and the first example is integer multiplication, the first array is also of integer type and rounds the values. Conversely, when the first operation is division, the datatype is automatically upcasted to float. You can fix your code by specifying a dtype in vector_function (which doesn't necessarily have to be as big as 64-bit for this problem):

def vector_function(x, y):
    """
    Make it possible to accept vectors as input
    """
    v_scalar_function = np.vectorize(scalar_function, otypes=[np.float64])
    return v_scalar_function(x, y)

Separately, you should also make note from that very same documentation that numpy.vectorize is a convenience function and basically just wraps a Python for loop so is not vectorized in the sense that it provides any real performance gains.

For a binary choice like this, a better overall approach would be:

def vectorized_scalar_function(arr_1, arr_2):
    return np.where(arr_1 < arr_2, arr_1 * arr_2, arr_1 / arr_2)

print(vectorized_scalar_function(np.array([4,4]), np.array([4,3])))
print(vectorized_scalar_function(np.array([3,4]), np.array([4,3])))

The above should be orders of magnitude faster and (possibly coincidentally rather than a hard-and-fast rule to rely on) doesn't suffer the type casting issue for the result.

Upvotes: 6

warped
warped

Reputation: 9481

Checking which statemets are triggered:

import numpy as np

def scalar_function(x, y):
    """ A function that returns x*y if x<y and x/y otherwise
    """
    if x < y :
        print('if x: ',x)
        print('if y: ',y)
        out = x * y 
        print('if out', out)
    else:
        print('else x: ',x)
        print('else y: ',y)
        out = x/y
        print('else out', out)

    return out

def vector_function(x, y):
    """
    Make it possible to accept vectors as input
    """
    v_scalar_function = np.vectorize(scalar_function)
    return v_scalar_function(x, y)


vector_function(np.array([3,4]), np.array([4,3]))

if x:  3
if y:  4
if out 12
if x:  3
if y:  4
if out 12
else x:  4
else y:  3
else out 1.3333333333333333 # <-- seems that the value is calculated correctly, but the wrong dtype is returned

So, you can rewrite the scalar function:

def scalar_function(x, y):
    """ A function that returns x*y if x<y and x/y otherwise
    """
    if x < y :
        out = x * y 
    else:
        out = x/y
    return float(out)


vector_function(np.array([3,4]), np.array([4,3]))
array([12.        ,  1.33333333])

Upvotes: 2

Related Questions