ParmuTownley
ParmuTownley

Reputation: 1007

tensorflow: output layer always shows [1.]

This is a discriminative network I'm training so I could use it in a generative network. I trained in on a dataset with 2 features and does binary classification. 1 = meditating 0 = not meditating. (dataset is from one of siraj raval's video).

For some reasons, the output layer (ol) always outputs [1] in every test case.

My dataset: https://drive.google.com/open?id=0B5DaSp-aTU-KSmZtVmFoc0hRa3c

import pandas as pd
import tensorflow as tf

data = pd.read_csv("E:/workspace_py/datasets/simdata/linear_data_train.csv")
data_f = data.drop("lbl", axis = 1)
data_l = data.drop(["f1", "f2"], axis = 1)

learning_rate = 0.01
batch_size = 1
n_epochs = 30
n_examples = 999 # This is highly unsatisfying >:3
n_iteration = int(n_examples/batch_size)


features = tf.placeholder('float', [None, 2], name='features_placeholder')
labels = tf.placeholder('float', [None, 1], name = 'labels_placeholder')

weights = {
            'ol': tf.Variable(tf.random_normal([2, 1], stddev= -12), name = 'w_ol')
}

biases = {
            'ol': tf.Variable(tf.random_normal([1], stddev=-12), name = 'b_ol')
}

ol = tf.nn.sigmoid(tf.add(tf.matmul(features, weights['ol']), biases['ol']), name = 'ol')

loss = -tf.reduce_sum(labels*tf.log(ol), name = 'loss') # cross entropy
train = tf.train.AdamOptimizer(learning_rate).minimize(loss)

sess = tf.Session()
sess.run(tf.global_variables_initializer())

for epoch in range(n_epochs):
    ptr = 0
    for iteration in range(n_iteration):
        epoch_x = data_f[ptr: ptr + batch_size]
        epoch_y = data_l[ptr: ptr + batch_size]
        ptr = ptr + batch_size

        _, err = sess.run([train, loss], feed_dict={features: epoch_x, labels:epoch_y})
    print("Loss @ epoch ", epoch, " = ", err)

print("Testing...\n")

data = pd.read_csv("E:/workspace_py/datasets/simdata/linear_data_eval.csv")
test_data_l = data.drop(["f1", "f2"], axis = 1)
test_data_f = data.drop("lbl", axis = 1)
#vvvHERE    
print(sess.run(ol, feed_dict={features: test_data_f})) #<<<HERE
#^^^HERE
saver = tf.train.Saver()
saver.save(sess, save_path="E:/workspace_py/saved_models/meditation_disciminative_model.ckpt")
sess.close()

output:

2017-10-11 00:49:47.453721: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-10-11 00:49:47.454212: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-11 00:49:49.608862: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:955] Found device 0 with properties: 
name: GeForce GTX 960M
major: 5 minor: 0 memoryClockRate (GHz) 1.176
pciBusID 0000:01:00.0
Total memory: 4.00GiB
Free memory: 3.35GiB
2017-10-11 00:49:49.609281: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:976] DMA: 0 
2017-10-11 00:49:49.609464: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:986] 0:   Y 
2017-10-11 00:49:49.609659: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 960M, pci bus id: 0000:01:00.0)
Loss @ epoch  0  =  0.000135789
Loss @ epoch  1  =  4.16049e-05
Loss @ epoch  2  =  1.84776e-05
Loss @ epoch  3  =  9.41758e-06
Loss @ epoch  4  =  5.24522e-06
Loss @ epoch  5  =  2.98024e-06
Loss @ epoch  6  =  1.66893e-06
Loss @ epoch  7  =  1.07288e-06
Loss @ epoch  8  =  5.96047e-07
Loss @ epoch  9  =  3.57628e-07
Loss @ epoch  10  =  2.38419e-07
Loss @ epoch  11  =  1.19209e-07
Loss @ epoch  12  =  1.19209e-07
Loss @ epoch  13  =  1.19209e-07
Loss @ epoch  14  =  -0.0
Loss @ epoch  15  =  -0.0
Loss @ epoch  16  =  -0.0
Loss @ epoch  17  =  -0.0
Loss @ epoch  18  =  -0.0
Loss @ epoch  19  =  -0.0
Loss @ epoch  20  =  -0.0
Loss @ epoch  21  =  -0.0
Loss @ epoch  22  =  -0.0
Loss @ epoch  23  =  -0.0
Loss @ epoch  24  =  -0.0
Loss @ epoch  25  =  -0.0
Loss @ epoch  26  =  -0.0
Loss @ epoch  27  =  -0.0
Loss @ epoch  28  =  -0.0
Loss @ epoch  29  =  -0.0
Testing...

[[ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]
 [ 1.]]
Saving model...
[Finished in 57.9s]

Upvotes: 1

Views: 357

Answers (1)

lejlot
lejlot

Reputation: 66835

Main problem

First of all this is not valid cross entropy loss. The equation you are using works only with 2 or more outputs. With a single sigmoid output you have to do

-tf.reduce_sum(labels*tf.log(ol) + (1-labels)*tf.log(1-ol), name = 'loss')

otherwise the optimal solution is to always answer "1" (which is happening right now).

Why?

Note that labels is only 0 or 1, and your whole loss is a multiplication of label and logarithm of the prediction. Consequently when true label is 0, your loss is 0 no matter your prediction, as 0 * log(x) = 0 no matter what is x (as long as log(x) is defined). Consequently your model is only penalised for not predicting "1" when it should, and so it learns to output 1 all the time.

Some other odd things

  1. You are providing negative stddev to normal distribution, while you should not (unless this is some undocumented feature of random_normal, but according to docs it should accept a single positive float, and you should provide a small number there).

  2. Computing cross entropy like this (in a naive way) is not numericaly stable, take a look at tf.sigmoid_cross_entropy_with_logits.

  3. You are not permuting your dataset, thus you always process data in the same order, which can have bad consequences (periodic increases in the loss, harder convergence or lack of convergence).

Upvotes: 2

Related Questions