Camilo Martínez M.
Camilo Martínez M.

Reputation: 1565

Series of numpy operations seem to depend on numpy arrays initialization

I'm in the process of vectorizing a series of operations with numpy arrays. And in doing that, I'm getting different results in different executions of my code. I'm using numpy.random.randn to get numpy arrays of the shapes that I need and testing if my operations do what I want them to do. I sometimes get the desired results and sometimes I don't, even though it's the same series of operations.

In theory, the first row of big_result should be equal to result_1, while the second row equal to result_2. This happens sometimes. Other times the first row is correct, while the second one isn't. And other times, both are erroneus.

This is the code to reproduce the problem (I added some seeds I got by hand, in which the error happens and some where it doesn't):

import numpy as np

np.random.seed(2) # First row of big_result corresponds to result_1, the rest is wrong
np.random.seed(3) # First row of big_result corresponds to result_1, the rest is wrong
np.random.seed(4) # Both rows of big_result are ok

# My two vectors
a = np.random.randn(1000, 8)
b = np.random.randn(500, 8)

# Operation vector
c = np.random.randn(3, 6, 8)

# Matrix with both a, b
big_A = np.full((2, 1000, 8), np.nan)
big_A[0, :, :] = a
big_A[1, :500, :] = b

# Broadcasting used to enable the use of c with big_A
big_C = np.broadcast_to(c, (2, 3, 6, 8)).reshape((3, 6, 2, 8))

# First result (for a)
result_1 = np.linalg.norm(a[:, np.newaxis] - c[:, np.newaxis, :], axis=-1)
result_1 = np.min(result_1[np.newaxis], axis=(-1, 1)) # Shape: (1, 1000)

# Second result (for b)
result_2 = np.linalg.norm(b[:, np.newaxis] - c[:, np.newaxis, :], axis=-1)
result_2 = np.min(result_2[np.newaxis], axis=(-1, 1)) # Shape: (1, 500)

# I mask out this operation to avoid meddling of NaN values
arr_with_nan = big_A[np.newaxis] - big_C[:, :, :, np.newaxis]
arr_with_nan = np.ma.array(arr_with_nan, mask=np.isnan(arr_with_nan))

# Result for both a and b 
big_result = np.linalg.norm(arr_with_nan, axis=-1)
big_result = np.min(big_result[np.newaxis], axis=2)
big_result = np.min(big_result, axis=1)[0] # Shape: (2, 1000), where (1,) -> result_1; (2,) -> result_2

Some remarks

a and b correspond to numpy arrays whose dimension in axis=1 may vary. This is why I'm appending them in a big numpy matrix big_A with np.nan to fill up the values. Plus, my "operation vector", c, is being broadcasted to the shape (2, 3, 6, 8) so that it can be used for both a and b inside big_A. This broadcasting doesn't let me operate big_C and big_A immediately, so I had to reshape big_C to (3, 6, 2, 8).

My guess is that I'm dealing with a rounding error. I've tried casting everything with .astype(np.float32 or np.float64) but it doesn't work. If I subsequently execute my code, it fails. How do I make sure my code works independently of initial values in a and b?

EDIT: I forgot to add my precise question.

Upvotes: 1

Views: 96

Answers (1)

hpaulj
hpaulj

Reputation: 231385

Don't reshape after broadcast_to. Transpose if you must, or use big_C[:,:,None,:] to make a (2,3,1,6,8)

Upvotes: 1

Related Questions