adir abargil
adir abargil

Reputation: 5745

transform broadcasting to something calculateable. matrix np.multipy

I'm trying to calculate this type of calculation:

arr = np.arange(4) 
# array([0, 1, 2, 3])

arr_t =arr.reshape((-1,1))
# array([[0],
#        [1],
#        [2],
#        [3]])

mult_arr = np.multiply(arr,arr_t) # <<< the multiplication
# array([[0, 0, 0, 0],
#        [0, 1, 2, 3],
#        [0, 2, 4, 6],
#        [0, 3, 6, 9]])

to eventually perform it in a bigger matrix index of single row, and to sum all the matrices that are reproduced by the calculation:

arr = np.random.random((600,150))

arr_t =arr.reshape((-1,arr.shape[1],1))


mult = np.multiply(arr[:,None],arr_t)
summed = np.sum(mult,axis=0)
summed

Till now its all pure awesomeness, the problem starts when I try to covert on a bigger dataset, for example this array instead :

arr = np.random.random((6000,1500))

I get the following error - MemoryError: Unable to allocate 101. GiB for an array with shape (6000, 1500, 1500) and data type float64 which make sense, but my question is:

can I get around this anyhow without being forced to use loops that slow down the process entirely ??

my question is mainly about performance and solution that require long running tasks more then 30 secs is not an option.

Upvotes: 0

Views: 39

Answers (1)

mozway
mozway

Reputation: 260640

Looks like you are simply trying to perform a dot product:

arr.T@arr

or

arr.T.dot(arr)
checking this is what you want
arr = np.random.random((600,150))

arr_t =arr.reshape((-1,arr.shape[1],1))
mult = np.multiply(arr[:,None],arr_t)
summed = np.sum(mult,axis=0)

np.allclose((arr.T@arr), summed)
# True

Upvotes: 1

Related Questions