Sahil
Sahil

Reputation: 1386

How to multiply Tensors in Theano

I have two tensors and a weight matrix in Theano. Tensor A has dimension (k, 5, 40). Tensor B has dimension (k, 5, 40). Weight matrix W has dimension (40,40). I would like to compute AWB. What is the correct sequence of Theano tensor operations to achieve this? Note that k can vary during run time but other dimensions are fixed. The semantics we want from AWB is the following:

Think of A as a collection of k (5,40) matrices. Call them A_1, ..., A_k Think of B as a collection of k (5,40) matrices. Call them B_1, ..., B_k. We want to find A_{i} * W * B_{i}^{T} for all i from 1 to k. I tried using theano.tensor.dot but it seems quite confusing.

Note that a non-efficient way of doing this is to use the scan function. But this would make an inherently parallel code sequential.

Upvotes: 0

Views: 1965

Answers (1)

malioboro
malioboro

Reputation: 3281

I'm sorry, but I don't know what did you mean "confusing"?

I've try using small case, I hope it can represent your case. Dot-product using theano.tensor.dot between three dimensional tensor and two dimensional matrix:

import numpy as np
import theano
import theano.tensor as T

a = T.tensor3('a', dtype='int64')
c = T.matrix('c',dtype='int64')
d = T.dot(a,c)

g = theano.function([a,c],d)

x = np.array([[[1,2],[1,3]],[[2,2],[1,1]]], dtype=int)
y = np.array([[1,2],[1,3]], dtype=int)
print g(x,y)

the output:

[[[ 3  8]
  [ 4 11]]

 [[ 4 10]
  [ 2  5]]]

It works like your logic, matrix c do dot-product only in the second and third dimension.

UPDATE

that first code above, you can use for the first operation in your case (A*W). Sorry I'm not calculate carefully, of course after that operation the output become three dimensional tensor. So, to perform (AW)*B you must use different approach. To perform multiplication between two three dimensional tensors I usually using scan:

import numpy as np
import theano
import theano.tensor as T

a = T.tensor3('a', dtype='int64')
c = T.tensor3('c',dtype='int64')
d, b = theano.scan(lambda i: T.dot(a[i,:], c[i,:]),sequences=T.arange(2))
g = theano.function([a,c],d)

x = np.array([[[1,2],[1,3]],[[2,2],[1,1]]], dtype=int)
y = np.array([[[1,2],[1,3]],[[2,2],[1,1]]], dtype=int)
print g(x,y)

but I know there are another approach using theano.tensor.batched_dot (theano.tensor.dot I think only for 2D and 1D array). In your case it's simple to code like this:

e = T.batched_dot(a,c)
g = theano.function([a,c],e)

the code above give the same results. Hope it helps.

Upvotes: 1

Related Questions