Sameer Kumar
Sameer Kumar

Reputation: 93

adding one hot encoding throws error in previously working code in Tensorflow

with tf.variable_scope("rnn_seq2seq"):

    w = tf.get_variable("proj_w", [num_units, seq_width])
    w_t = tf.transpose(w)
    b = tf.get_variable("proj_b", [seq_width])
    output_projection=(w,b)

    output,state = rnn_seq2seq(enc_inputs,dec_inputs,cell,output_projection=output_projection,feed_previous=False)

    weights=[tf.ones([batch_size * dec_steps])]
    loss=[]
    for i in xrange(dec_steps -1):
        logits = tf.nn.xw_plus_b(output[i],output_projection[0],output_projection[1])

If I introduce one hot encoding on the logits here, the program gives error later although both returns the same dimensions. If I comment out this line, the program does not give any error.

    prev = logits
        logits = tf.to_float(tf.equal(prev,tf.reduce_max(prev,reduction_indices=[1],keep_dims=True)))
        print prev
        print logits

Tensor("rnn_seq2seq/xw_plus_b:0", shape=TensorShape([Dimension(800), Dimension(14)]), dtype=float32)

Tensor("rnn_seq2seq/ToFloat:0", shape=TensorShape([Dimension(800), Dimension(14)]), dtype=float32)

Rest of code:

    crossent =tf.nn.softmax_cross_entropy_with_logits(logits,dec_inputs[i+1],name="SequenceLoss/CrossEntropy{0}".format(i))
    loss.append(crossent)

cost = tf.reduce_sum(tf.add_n(loss))
final_state = state[-1]
tvars = tf.trainable_variables()

grads,norm = tf.clip_by_global_norm(tf.gradients(cost,tvars),5)
lr = tf.Variable(0.0,name="learningRate")
optimizer = tf.train.GradientDescentOptimizer(lr)
train_op = optimizer.apply_gradients(zip(grads,tvars))

---> 23 grads,norm = tf.clip_by_global_norm(tf.gradients(cost,tvars),5)

ValueError: List argument 'values' to 'Pack' Op with length 0 shorter than minimum length 1.

Upvotes: 3

Views: 1055

Answers (1)

Ishamael
Ishamael

Reputation: 12795

Neural networks can only be trained if all the operations they perform are differentiable. The "one-hot" step you apply is not differentiable, and hence such a neural network cannot be trained using any gradient descent-based optimizer (= any optimizer that tensor flow implements).

The general approach is to use softmax (which is differentiable) during training to approximate one-hot encoding (and your model already has softmax following computing logits, so commenting out the "one-hot" is actually all you need to do).

Upvotes: 3

Related Questions