Why is numpy.vectorize() changing the division output of a scalar function?

Question

I'm obtaining a strange result when I vectorise a function with numpy.

import numpy as np
def scalar_function(x, y):
    """ A function that returns x*y if x



we do have 

scalar_function(4,3)
# 1.3333333333333333


Why is the vectorized version giving this strange output? 

vector_function(np.array([3,4]), np.array([4,3]))
[12  1]


While this call to the vectorized version works fine:

vector_function(np.array([4,4]), np.array([4,3]))
[1.         1.33333333]


Reading numpy.divide:


  Notes
  The floor division operator // was added in Python 2.2 making // and / equivalent operators. The default floor division operation of / can be replaced by true division with from __future__ import division.
  In Python 3.0, // is the floor division operator and / the true division operator. The true_divide(x1, x2) function is equivalent to true division in Python.


Makes me think this might be a remaining issue related to python2? 
But I'm using python 3!

roganjosh · Accepted Answer

The docs for numpy.vectorize state:

The output type is determined by evaluating the first element of the input, unless it is specified

Since you did not specify a return data type, and the first example is integer multiplication, the first array is also of integer type and rounds the values. Conversely, when the first operation is division, the datatype is automatically upcasted to float. You can fix your code by specifying a dtype in vector_function (which doesn't necessarily have to be as big as 64-bit for this problem):

def vector_function(x, y):
    """
    Make it possible to accept vectors as input
    """
    v_scalar_function = np.vectorize(scalar_function, otypes=[np.float64])
    return v_scalar_function(x, y)

Separately, you should also make note from that very same documentation that numpy.vectorize is a convenience function and basically just wraps a Python for loop so is not vectorized in the sense that it provides any real performance gains.

For a binary choice like this, a better overall approach would be:

def vectorized_scalar_function(arr_1, arr_2):
    return np.where(arr_1 < arr_2, arr_1 * arr_2, arr_1 / arr_2)

print(vectorized_scalar_function(np.array([4,4]), np.array([4,3])))
print(vectorized_scalar_function(np.array([3,4]), np.array([4,3])))

The above should be orders of magnitude faster and (possibly coincidentally rather than a hard-and-fast rule to rely on) doesn't suffer the type casting issue for the result.

Why is numpy.vectorize() changing the division output of a scalar function?

Answers (2)

Related Questions