Reputation: 11419
I'm obtaining a strange result when I vectorise a function with numpy.
import numpy as np
def scalar_function(x, y):
""" A function that returns x*y if x<y and x/y otherwise
"""
if x < y :
out = x * y
else:
out = x/y
return out
def vector_function(x, y):
"""
Make it possible to accept vectors as input
"""
v_scalar_function = np.vectorize(scalar_function)
return v_scalar_function(x, y)
we do have
scalar_function(4,3)
# 1.3333333333333333
Why is the vectorized version giving this strange output?
vector_function(np.array([3,4]), np.array([4,3]))
[12 1]
While this call to the vectorized version works fine:
vector_function(np.array([4,4]), np.array([4,3]))
[1. 1.33333333]
Reading numpy.divide:
Notes The floor division operator // was added in Python 2.2 making // and / equivalent operators. The default floor division operation of / can be replaced by true division with from
__future__
import division. In Python 3.0, // is the floor division operator and / the true division operator. The true_divide(x1, x2) function is equivalent to true division in Python.
Makes me think this might be a remaining issue related to python2? But I'm using python 3!
Upvotes: 1
Views: 357
Reputation: 13185
The docs for numpy.vectorize
state:
The output type is determined by evaluating the first element of the input, unless it is specified
Since you did not specify a return data type, and the first example is integer multiplication, the first array is also of integer type and rounds the values. Conversely, when the first operation is division, the datatype is automatically upcasted to float. You can fix your code by specifying a dtype in vector_function
(which doesn't necessarily have to be as big as 64-bit for this problem):
def vector_function(x, y):
"""
Make it possible to accept vectors as input
"""
v_scalar_function = np.vectorize(scalar_function, otypes=[np.float64])
return v_scalar_function(x, y)
Separately, you should also make note from that very same documentation that numpy.vectorize
is a convenience function and basically just wraps a Python for
loop so is not vectorized in the sense that it provides any real performance gains.
For a binary choice like this, a better overall approach would be:
def vectorized_scalar_function(arr_1, arr_2):
return np.where(arr_1 < arr_2, arr_1 * arr_2, arr_1 / arr_2)
print(vectorized_scalar_function(np.array([4,4]), np.array([4,3])))
print(vectorized_scalar_function(np.array([3,4]), np.array([4,3])))
The above should be orders of magnitude faster and (possibly coincidentally rather than a hard-and-fast rule to rely on) doesn't suffer the type casting issue for the result.
Upvotes: 6
Reputation: 9481
Checking which statemets are triggered:
import numpy as np
def scalar_function(x, y):
""" A function that returns x*y if x<y and x/y otherwise
"""
if x < y :
print('if x: ',x)
print('if y: ',y)
out = x * y
print('if out', out)
else:
print('else x: ',x)
print('else y: ',y)
out = x/y
print('else out', out)
return out
def vector_function(x, y):
"""
Make it possible to accept vectors as input
"""
v_scalar_function = np.vectorize(scalar_function)
return v_scalar_function(x, y)
vector_function(np.array([3,4]), np.array([4,3]))
if x: 3
if y: 4
if out 12
if x: 3
if y: 4
if out 12
else x: 4
else y: 3
else out 1.3333333333333333 # <-- seems that the value is calculated correctly, but the wrong dtype is returned
So, you can rewrite the scalar function:
def scalar_function(x, y):
""" A function that returns x*y if x<y and x/y otherwise
"""
if x < y :
out = x * y
else:
out = x/y
return float(out)
vector_function(np.array([3,4]), np.array([4,3]))
array([12. , 1.33333333])
Upvotes: 2