Philip Seyfi
Philip Seyfi

Reputation: 959

Tensorflow placeholder for one-hot encoded labels

I've one-hot encoded labels (11 classes ranging from 0 to 10):

# one-hot encode labels
from sklearn.preprocessing import OneHotEncoder

labels = df.rating.values.reshape([-1, 1])
encoder = OneHotEncoder(sparse=False)
encoder.fit(labels)
labels = encoder.transform(labels)

And have the following placeholders:

# create the graph object
graph = tf.Graph()
# add nodes to the graph
with graph.as_default():
    inputs_ = tf.placeholder(tf.int32, [None, None], name='inputs')
    labels_ = tf.placeholder(tf.int32, [None, 1], name='labels')
    keep_prob = tf.placeholder(tf.float32, name='keep_prob')

And I'm using sparse_softmax_cross_entropy:

with graph.as_default():
    logits = tf.layers.dense(inputs=outputs[:, -1], units=1)
    loss = tf.losses.sparse_softmax_cross_entropy(labels=labels_, logits=logits)        
    optimizer = tf.train.AdamOptimizer(learning_rate).minimize(loss)

TF throws: ValueError: Cannot feed value of shape (500, 1, 11) for Tensor 'labels:0', which has shape '(?, 1)'

I've tried everything and can't get it to work. What is the proper placeholder for one-hot encoded data?

Upvotes: 0

Views: 2018

Answers (1)

Peter Szoldan
Peter Szoldan

Reputation: 4868

The second dimension should be however many classes you have. One-hot encoding means that if you have let's say 10 classes, and you encode class 5, that will yield a vector [ 0, 0, 0, 0, 0, 1, 0, 0, 0, 0 ] which is 10 long. So the code should be:

labels_ = tf.placeholder(tf.int32, [None, ***number of classes***], name='labels')

Then again, tf.losses.sparse_softmax_cross_entropy() takes a class label, not one-hot encoding. So either you decode it with tf.argmax() before feeding it to tf.losses.sparse_softmax_cross_entropy() like so:

loss = tf.losses.sparse_softmax_cross_entropy(
    labels = tf.argmax( labels_, axis = 1 ), logits = logits )

or the real question is why do you then use ont-hot encoding at all in the first place? You can just feed df.rating.values.reshape([-1, 1]) to your graph as labels_ and keep the 1 in the second dimension. The whole one-hot encoding block is unnecessary.

There are a few other issues in your code (not shown in original question) that affect this problem. First of all, you feed the network like this:

    feed = {inputs_: x,
            labels_: y[:, None],
            keep_prob: 1,
            initial_state: test_state}

In your effort to try to fix the labels_ issue, you added the indexing [:, None]. The issue is that in Numpy the index None has a special meaning: it will insert a new dimension. So that's where the extra dimension in ( 500, 1, 1 ) comes from. Indexing y is unnecessary here, I've removed that. So the code should be:

    feed = {inputs_: x,
            labels_: y,
            keep_prob: 1,
            initial_state: test_state}

Then comes another issue, a very common mistake, in this line:

loss, state, _ = sess.run([loss, final_state, optimizer], feed_dict=feed)

you assign the value of loss to loss, therefore loss is now a number, not the tensor it should be. So on the second iteration the code fails. I've changed it to

loss_val, state, _ = sess.run([loss, final_state, optimizer], feed_dict=feed)

but of course you need to propagate that change to the print() as well:

print("Epoch: {}/{}".format(e, epochs),
      "Iteration: {}".format(iteration),
      "Train loss: {:.3f}".format(loss_val))

Also, where you define your logits, you have to have 11 units, since you have 11 classes ( 0 - 10 ), and you need one probability for each class:

logits = tf.layers.dense(inputs=outputs[:, -1], units=11 )

With these changes the training runs, even seems to learn something:

('Epoch: 0/10', 'Iteration: 5', 'Train loss: 1.735')
('Epoch: 0/10', 'Iteration: 10', 'Train loss: 2.092')
('Epoch: 0/10', 'Iteration: 15', 'Train loss: 2.644')
('Epoch: 0/10', 'Iteration: 20', 'Train loss: 1.596')
('Epoch: 0/10', 'Iteration: 25', 'Train loss: 1.759')
Val acc: 0.012
('Epoch: 0/10', 'Iteration: 30', 'Train loss: 1.581')
('Epoch: 0/10', 'Iteration: 35', 'Train loss: 2.213')
('Epoch: 0/10', 'Iteration: 40', 'Train loss: 2.176')
('Epoch: 0/10', 'Iteration: 45', 'Train loss: 1.849')
('Epoch: 0/10', 'Iteration: 50', 'Train loss: 2.474')
Val acc: 0.017

Upvotes: 2

Related Questions