Akhmad Zaki
Akhmad Zaki

Reputation: 433

Faster 3D Matrix Operation - Python

I am working with 3D matrix in Python, for example, given matrix like this with size of 2x3x4:

[[[1 2 1 4]
  [3 2 1 1]
  [4 3 1 4]]

 [[2 1 3 3]
  [1 4 2 1]
  [3 2 3 3]]]

I have task to find the value of entropy in each row in each dimension matrix. For example, in row 1 of dimension 1 of the matrix above [1,2,1,4], the normalized value (as such the total sum is 1) is [0.125, 0.25, 0.125, 0.5] and the value of entropy is calculated by the formula -sum(i*log(i)) where i is the normalized value. The resulting matrix is a 2x3 matrix where in each dimension there are 3 values of entropy (because there are 3 rows).

Here is the working example of my code using random matrix each time:

from scipy.stats import entropy
import numpy as np

matrix = np.random.randint(low=1,high=5,size=(2,3,4)) #how if size is (200,50,1000)
entropy_matrix=np.zeros((matrix.shape[0],matrix.shape[1]))
for i in range(matrix.shape[0]):
    normalized = np.array([float(k)/np.sum(j) for j in matrix[i] for k in j]).reshape(matrix.shape[1],matrix.shape[2])
    entropy_matrix[i] = np.array([entropy(m) for m in normalized])

My question is how do I scale-up this program to work with very large 3D matrix (for example with size of 200x50x1000) ?

I am using Python in Windows 10 (with Anaconda distribution). Using 3D matrix size of 200x50x1000, I got running time of 290 s on my computer.

Upvotes: 2

Views: 374

Answers (1)

Divakar
Divakar

Reputation: 221514

Using the definition of entropy for the second part and broadcasted operation on the first part, one vectorized solution would be -

p1 = matrix/matrix.sum(-1,keepdims=True).astype(float)
entropy_matrix_out = -np.sum(p1 * np.log(p1), axis=-1)

Alternatively, we can use einsum for the second part for further perf. boost -

entropy_matrix_out = -np.einsum('ijk,ijk->ij',p1,np.log(p1),optimize=True)

Upvotes: 1

Related Questions