Reputation: 1565
I'm in the process of vectorizing a series of operations with numpy arrays. And in doing that, I'm getting different results in different executions of my code. I'm using numpy.random.randn
to get numpy arrays of the shapes that I need and testing if my operations do what I want them to do. I sometimes get the desired results and sometimes I don't, even though it's the same series of operations.
In theory, the first row of big_result
should be equal to result_1
, while the second row equal to result_2
. This happens sometimes. Other times the first row is correct, while the second one isn't. And other times, both are erroneus.
This is the code to reproduce the problem (I added some seeds I got by hand, in which the error happens and some where it doesn't):
import numpy as np
np.random.seed(2) # First row of big_result corresponds to result_1, the rest is wrong
np.random.seed(3) # First row of big_result corresponds to result_1, the rest is wrong
np.random.seed(4) # Both rows of big_result are ok
# My two vectors
a = np.random.randn(1000, 8)
b = np.random.randn(500, 8)
# Operation vector
c = np.random.randn(3, 6, 8)
# Matrix with both a, b
big_A = np.full((2, 1000, 8), np.nan)
big_A[0, :, :] = a
big_A[1, :500, :] = b
# Broadcasting used to enable the use of c with big_A
big_C = np.broadcast_to(c, (2, 3, 6, 8)).reshape((3, 6, 2, 8))
# First result (for a)
result_1 = np.linalg.norm(a[:, np.newaxis] - c[:, np.newaxis, :], axis=-1)
result_1 = np.min(result_1[np.newaxis], axis=(-1, 1)) # Shape: (1, 1000)
# Second result (for b)
result_2 = np.linalg.norm(b[:, np.newaxis] - c[:, np.newaxis, :], axis=-1)
result_2 = np.min(result_2[np.newaxis], axis=(-1, 1)) # Shape: (1, 500)
# I mask out this operation to avoid meddling of NaN values
arr_with_nan = big_A[np.newaxis] - big_C[:, :, :, np.newaxis]
arr_with_nan = np.ma.array(arr_with_nan, mask=np.isnan(arr_with_nan))
# Result for both a and b
big_result = np.linalg.norm(arr_with_nan, axis=-1)
big_result = np.min(big_result[np.newaxis], axis=2)
big_result = np.min(big_result, axis=1)[0] # Shape: (2, 1000), where (1,) -> result_1; (2,) -> result_2
a
and b
correspond to numpy arrays whose dimension in axis=1
may vary. This is why I'm appending them in a big numpy matrix big_A
with np.nan
to fill up the values. Plus, my "operation vector", c
, is being broadcasted to the shape (2, 3, 6, 8)
so that it can be used for both a
and b
inside big_A
. This broadcasting doesn't let me operate big_C
and big_A
immediately, so I had to reshape big_C
to (3, 6, 2, 8)
.
My guess is that I'm dealing with a rounding error. I've tried casting everything with .astype(np.float32 or np.float64)
but it doesn't work. If I subsequently execute my code, it fails. How do I make sure my code works independently of initial values in a
and b
?
EDIT: I forgot to add my precise question.
Upvotes: 1
Views: 96
Reputation: 231385
Don't reshape after broadcast_to
. Transpose if you must, or use big_C[:,:,None,:]
to make a (2,3,1,6,8)
Upvotes: 1