tourist
tourist

Reputation: 4333

Recurrent neural network architecture

I'm working on a RNN architecture which does speech enhancement. The dimensions of the input is [XX, X, 1024] where XX is the batch size and X is the variable sequence length.

The input to the network is positive valued data and the output is masked binary data(IBM) which is later used to construct enhanced signal.

For instance, if the input to network is [10, 65, 1024] the output will be [10,65,1024] tensor with binary values. I'm using Tensorflow with mean squared error as loss function. But I'm not sure which activation function to use here(which keeps the outputs either zero or one), Following is the code I've come up with so far

tf.reset_default_graph()
num_units = 10 #
num_layers = 3 #
dropout = tf.placeholder(tf.float32)

cells = []
for _ in range(num_layers):
    cell = tf.contrib.rnn.LSTMCell(num_units)
    cell = tf.contrib.rnn.DropoutWrapper(cell, output_keep_prob = dropout)
    cells.append(cell)
cell = tf.contrib.rnn.MultiRNNCell(cells)

X = tf.placeholder(tf.float32, [None, None, 1024])
Y = tf.placeholder(tf.float32, [None, None, 1024])

output, state = tf.nn.dynamic_rnn(cell, X, dtype=tf.float32)

out_size = Y.get_shape()[2].value
logit = tf.contrib.layers.fully_connected(output, out_size)
prediction = (logit)

flat_Y = tf.reshape(Y, [-1] + Y.shape.as_list()[2:])
flat_logit = tf.reshape(logit, [-1] + logit.shape.as_list()[2:])

loss_op = tf.losses.mean_squared_error(labels=flat_Y, predictions=flat_logit)  

#adam optimizier as the optimization function
optimizer = tf.train.AdamOptimizer(learning_rate=0.001) #
train_op = optimizer.minimize(loss_op)

#extract the correct predictions and compute the accuracy
correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

Also my reconstruction isn't good. Can someone suggest on improving the model?

Upvotes: 2

Views: 106

Answers (1)

rvinas
rvinas

Reputation: 11895

If you want your outputs to be either 0 or 1, to me it seems a good idea to turn this into a classification problem. To this end, I would use a sigmoidal activation and cross entropy:

...
prediction = tf.nn.sigmoid(logit)
loss_op = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=Y, logits=logit))
...

In addition, from my point of view the hidden dimensionality (10) of your stacked RNNs seems quite small for such a big input dimensionality (1024). However this is just a guess, and it is something that needs to be tuned.

Upvotes: 2

Related Questions