Reputation: 767
I am trying to build a CLDNN that is researched in the paper here
After the convolutional layers, the features go through a dim-reduction layer. At the point when the features leave the conv layers, the dimensions are [?, N, M]
. N represents the number of windows and I think the network requires the reduction in the dimension M, so the dimensions of the features after the dim-red layer is [?,N,Q]
, where Q < M
.
I have two questions.
How do I do this in TensorFlow? I tried using a weight with
W = tf.Variable( tf.truncated_normal([M,Q],stddev=0.1) )
I thought the multiplication of tf.matmul(x,W)
would yield [?, N, Q]
but [?, N, M]
and [M, Q]
are not valid dimensions for multiplication. I would like to keep N constant and reduce the dimension of M.
What kind of non-linearity should I apply to the outcome of tf.matmul(x,W)
? I was thinking about using a ReLU but I couldn't even get #1 done.
Upvotes: 2
Views: 5019
Reputation: 14336
According to the linked paper (T. N. Sainath et al.: "Convolutional, Long Short-Term Memory, Fully Connected Deep Neural Networks"),
[...] reducing the dimensionality, such that we have 256 outputs from the linear layer, was appropriate.
That means, whatever the input size is, i.e. [?, N, M]
or any other dimensionality (always assuming that the first dimension is the number of samples in a mini-batch, denoted by ?
), the output will be [?, Q]
, where typically Q=256
.
As we are doing dimensionality reduction by multiplying the input with a weight matrix, no spatial information will be preserved. This means, that it doesn't matter whether each input is a matrix or a vector, so we can reshape the input to the linear layer x
to have the dimensions [?, N*M]
. Then, we can create a simple matrix multiplication tf.matmul(x, W)
where W
is a matrix with dimensions [N*M, Q].
W = tf.Variable(tf.truncated_normal([N*M, Q], stddev=0.1))
x_vec = tf.reshape(x, shape=(-1, N*M))
y = tf.matmul(x_vec, W)
Finally, regarding question 2: in the paper, the dimensionality reduction layer is a linear layer, i.e. you do not apply a non-linearity to the output.
Upvotes: 2