Reputation: 93
with tf.variable_scope("rnn_seq2seq"):
w = tf.get_variable("proj_w", [num_units, seq_width])
w_t = tf.transpose(w)
b = tf.get_variable("proj_b", [seq_width])
output_projection=(w,b)
output,state = rnn_seq2seq(enc_inputs,dec_inputs,cell,output_projection=output_projection,feed_previous=False)
weights=[tf.ones([batch_size * dec_steps])]
loss=[]
for i in xrange(dec_steps -1):
logits = tf.nn.xw_plus_b(output[i],output_projection[0],output_projection[1])
If I introduce one hot encoding on the logits here, the program gives error later although both returns the same dimensions. If I comment out this line, the program does not give any error.
prev = logits
logits = tf.to_float(tf.equal(prev,tf.reduce_max(prev,reduction_indices=[1],keep_dims=True)))
print prev
print logits
Tensor("rnn_seq2seq/xw_plus_b:0", shape=TensorShape([Dimension(800), Dimension(14)]), dtype=float32)
Tensor("rnn_seq2seq/ToFloat:0", shape=TensorShape([Dimension(800), Dimension(14)]), dtype=float32)
Rest of code:
crossent =tf.nn.softmax_cross_entropy_with_logits(logits,dec_inputs[i+1],name="SequenceLoss/CrossEntropy{0}".format(i))
loss.append(crossent)
cost = tf.reduce_sum(tf.add_n(loss))
final_state = state[-1]
tvars = tf.trainable_variables()
grads,norm = tf.clip_by_global_norm(tf.gradients(cost,tvars),5)
lr = tf.Variable(0.0,name="learningRate")
optimizer = tf.train.GradientDescentOptimizer(lr)
train_op = optimizer.apply_gradients(zip(grads,tvars))
---> 23 grads,norm = tf.clip_by_global_norm(tf.gradients(cost,tvars),5)
ValueError: List argument 'values' to 'Pack' Op with length 0 shorter than minimum length 1.
Upvotes: 3
Views: 1055
Reputation: 12795
Neural networks can only be trained if all the operations they perform are differentiable. The "one-hot" step you apply is not differentiable, and hence such a neural network cannot be trained using any gradient descent-based optimizer (= any optimizer that tensor flow implements).
The general approach is to use softmax
(which is differentiable) during training to approximate one-hot encoding (and your model already has softmax
following computing logits, so commenting out the "one-hot" is actually all you need to do).
Upvotes: 3