Reputation: 1386
I have two tensors and a weight matrix in Theano. Tensor A has dimension (k, 5, 40). Tensor B has dimension (k, 5, 40). Weight matrix W has dimension (40,40). I would like to compute AWB. What is the correct sequence of Theano tensor operations to achieve this? Note that k can vary during run time but other dimensions are fixed. The semantics we want from AWB is the following:
Think of A as a collection of k (5,40) matrices. Call them A_1, ..., A_k Think of B as a collection of k (5,40) matrices. Call them B_1, ..., B_k. We want to find A_{i} * W * B_{i}^{T} for all i from 1 to k. I tried using theano.tensor.dot but it seems quite confusing.
Note that a non-efficient way of doing this is to use the scan function. But this would make an inherently parallel code sequential.
Upvotes: 0
Views: 1965
Reputation: 3281
I'm sorry, but I don't know what did you mean "confusing"?
I've try using small case, I hope it can represent your case. Dot-product using theano.tensor.dot
between three dimensional tensor and two dimensional matrix:
import numpy as np
import theano
import theano.tensor as T
a = T.tensor3('a', dtype='int64')
c = T.matrix('c',dtype='int64')
d = T.dot(a,c)
g = theano.function([a,c],d)
x = np.array([[[1,2],[1,3]],[[2,2],[1,1]]], dtype=int)
y = np.array([[1,2],[1,3]], dtype=int)
print g(x,y)
the output:
[[[ 3 8]
[ 4 11]]
[[ 4 10]
[ 2 5]]]
It works like your logic, matrix c
do dot-product only in the second and third dimension.
UPDATE
that first code above, you can use for the first operation in your case (A*W). Sorry I'm not calculate carefully, of course after that operation the output become three dimensional tensor. So, to perform (AW)*B you must use different approach. To perform multiplication between two three dimensional tensors I usually using scan:
import numpy as np
import theano
import theano.tensor as T
a = T.tensor3('a', dtype='int64')
c = T.tensor3('c',dtype='int64')
d, b = theano.scan(lambda i: T.dot(a[i,:], c[i,:]),sequences=T.arange(2))
g = theano.function([a,c],d)
x = np.array([[[1,2],[1,3]],[[2,2],[1,1]]], dtype=int)
y = np.array([[[1,2],[1,3]],[[2,2],[1,1]]], dtype=int)
print g(x,y)
but I know there are another approach using theano.tensor.batched_dot
(theano.tensor.dot
I think only for 2D and 1D array). In your case it's simple to code like this:
e = T.batched_dot(a,c)
g = theano.function([a,c],e)
the code above give the same results. Hope it helps.
Upvotes: 1