hockeybro
hockeybro

Reputation: 1001

How to Have Multiple Softmax Outputs in Tensorflow?

I am trying to create a network in tensor flow with multiple softmax outputs, each of a different size. The network architecture is: Input -> LSTM -> Dropout. Then I have 2 softmax layers: Softmax of 10 outputs and Softmax of 20 Outputs. The reason for this is because I want to generate two sets of outputs (10 and 20), and then combine them to produce a final output. I'm not sure how to do this in Tensorflow.

Previously, to make a network like described, but with one softmax, I think I can do something like this.

inputs = tf.placeholder(tf.float32, [batch_size, maxlength, vocabsize])
lengths = tf.placeholders(tf.int32, [batch_size])
embeddings = tf.Variable(tf.random_uniform([vocabsize, 256], -1, 1))
lstm = {}
lstm[0] = tf.contrib.rnn.LSTMCell(hidden_layer_size, state_is_tuple=True, initializer=tf.contrib.layers.xavier_initializer(seed=random_seed))
lstm[0] = tf.contrib.rnn.DropoutWrapper(lstm[0], output_keep_prob=0.5)
lstm[0] = tf.contrib.rnn.MultiRNNCell(cells=[lstm[0]] * 1, state_is_tuple=True)
output_layer = {}
output_layer[0] = Layer.W(1 * hidden_layer_size, 20, 'OutputLayer')
output_bias = {}
output_bias[0] = Layer.b(20, 'OutputBias')
outputs = {}
fstate = {}
with tf.variable_scope("lstm0"):
    # create the rnn graph at run time
  outputs[0], fstate[0] = tf.nn.dynamic_rnn(lstm[0], tf.nn.embedding_lookup(embeddings, inputs),
                                      sequence_length=lengths, 
                                      dtype=tf.float32)
logits = {}
logits[0] = tf.matmul(tf.concat([f.h for f in fstate[0]], 1), output_layer[0]) + output_bias[0]
loss = {}
loss[0] = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits[0], labels=labels[0]))

However, now, I want my RNN output (after the dropout) to flow into 2 softmax layers, one of size 10 and another of size 20. Does anyone have an idea of how to do this?

Thanks

Edit: Ideally I would like to use a version of softmax such as what is defined here in this Knet Julia library. Does Tensorflow have an equivalent? https://github.com/denizyuret/Knet.jl/blob/1ef934cc58f9671f2d85063f88a3d6959a49d088/deprecated/src7/op/actf.jl#L103

Upvotes: 7

Views: 4452

Answers (2)

Pop
Pop

Reputation: 12411

You can do the following on the output of dynamic_rnn that you called output[0] in order to compute the two softmax and the corresponding losses:

with tf.variable_scope("softmax_0"):
    # Transform you RNN output to the right output size = 10
    W = tf.get_variable("kernel_0", [output[0].get_shape()[1], 10])
    logits_0 = tf.matmul(inputs, W)
    # Apply the softmax function to the logits (of size 10)
    output_0 = tf.nn.softmax(logits_0, name = "softmax_0")
    # Compute the loss (as you did in your question) with softmax_cross_entropy_with_logits directly applied on logits
    loss_0 = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits_0, labels=labels[0]))

with tf.variable_scope("softmax_1"):  
    # Transform you RNN output to the right output size = 20
    W = tf.get_variable("kernel_1", [output[0].get_shape()[1], 20])
    logits_1 = tf.matmul(inputs, W)
    # Apply the softmax function to the logits (of size 20)
    output_1 = tf.nn.softmax(logits_1, name = "softmax_1")
    # Compute the loss (as you did in your question) with softmax_cross_entropy_with_logits directly applied on logits
    loss_1 = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits_1, labels=labels[1]))

You can then combine the two losses if it is relevant to your application:

total_loss = loss_0 + loss_1

EDIT To answer your question in comment about what you specifically need to do with the two softmax outputs: you can do the following approximately:

with tf.variable_scope("second_part"):
    W1 = tf.get_variable("W_1", [output_1.get_shape()[1], n])
    W2 = tf.get_variable("W_2", [output_2.get_shape()[1], n])
    prediction = tf.matmul(output_1, W1) + tf.matmul(output_2, W2)
with tf.variable_scope("optimization_part"):
    loss = tf.reduce_mean(tf.squared_difference(prediction, label))

You just need to defined n, the number of columns of W1 and W2.

Upvotes: 4

Neeraj Kashyap
Neeraj Kashyap

Reputation: 130

You aren't defining your logits for the size 10 softmax layer in your code, and you would have to do that explicitly.

Once that was done, you could use tf.nn.softmax, applying it separately to both of your logit tensors.

For example, for your 20-class softmax tensor:

softmax20 = tf.nn.softmax(logits[0])

For the other layer, you could do:

output_layer[1] = Layer.W(1 * hidden_layer_size, 10, 'OutputLayer10')
output_bias[1] = Layer.b(10, 'OutputBias10')

logits[1] = tf.matmul(tf.concat([f.h for f in fstate[0]], 1), 
output_layer[1]) + output_bias[1]

softmax10 = tf.nn.softmax(logits[1])

There is also a tf.contrib.layers.softmax which allows you to apply the softmax on the final axis of a tensor with greater than 2 dimensions, but it doesn't look like you need anything like that. tf.nn.softmax should work here.

Side note: output_layer is not the greatest name for that list - should be something involving weights. These weights and biases (output_layer, output_bias) also do not represent the output layer of your network (as that will come from whatever you do to your softmax outputs, right?). [Sorry, couldn't help myself.]

Upvotes: 5

Related Questions