Reputation: 2825
I want to compute the sum product along one dimension of two multidimensional arrays, using Theano.
I'll describe precisely what I want to do using numpy
first. numpy.tensordot
and numpy.dot
seem to always do a matrix product, whereas I'm in essence looking for a batched equivalent of a vector product. Given x
and y
, I want to compute z
like so:
x = np.random.normal(size=(200, 2, 2, 1000))
y = np.random.normal(size=(200, 2, 2))
# this is how I now approach it:
z = np.sum(y[:,:,:,np.newaxis] * x, axis=1)
# z is of shape (200, 2, 1000)
Now I know that numpy.einsum
would probably be able to help me here, but again, I want to do this particular computation in Theano, which does not have an einsum
equivalent. I will need to use dot
, tensordot
, or Theano's specialized einsum subset functions batched_dot
or batched_tensordot
.
The reason I'm looking to change my approach to this is performance; I suspect that using builtin (CUDA) dot products will be faster than relying on broadcasting, element-wise product, and sum.
Upvotes: 4
Views: 901
Reputation: 9075
In Theano, none of the dimensions of three and four dimensional tensors are broadcastable. You have to explicitly set them. Then the Numpy principles will work just fine. One way to do this is to use T.patternbroadcast
. To read more about broadcasting, refer this.
You have three dimensions in one of the tensors. So first you need to append a singleton dimension at the end and then make that dimension broadcastable. These two things can be achieved with a single command - T.shape_padaxis
. The entire code is as follows:
import theano
from theano import tensor as T
import numpy as np
X = T.ftensor4('X')
Y = T.ftensor3('Y')
Y_broadcast = T.shape_padaxis(Y, axis=-1) # appending extra dimension and making it
# broadcastable
Z = T.sum((X*Y_broadcast), axis=1) # element-wise multiplication
f = theano.function([X, Y], Z, allow_input_downcast=True)
# Making sure that it works and gives correct results
x = np.random.normal(size=(3, 2, 2, 4))
y = np.random.normal(size=(3, 2, 2))
theano_result = f(x,y)
numpy_result = np.sum(y[:,:,:,np.newaxis] * x, axis=1)
print np.amax(theano_result - numpy_result) # prints 2.7e-7 on my system, close enough!
I hope this helps.
Upvotes: 1