Reputation: 1007
This is a discriminative network I'm training so I could use it in a generative network. I trained in on a dataset with 2 features and does binary classification. 1 = meditating 0 = not meditating. (dataset is from one of siraj raval's video).
For some reasons, the output layer (ol) always outputs [1] in every test case.
My dataset: https://drive.google.com/open?id=0B5DaSp-aTU-KSmZtVmFoc0hRa3c
import pandas as pd
import tensorflow as tf
data = pd.read_csv("E:/workspace_py/datasets/simdata/linear_data_train.csv")
data_f = data.drop("lbl", axis = 1)
data_l = data.drop(["f1", "f2"], axis = 1)
learning_rate = 0.01
batch_size = 1
n_epochs = 30
n_examples = 999 # This is highly unsatisfying >:3
n_iteration = int(n_examples/batch_size)
features = tf.placeholder('float', [None, 2], name='features_placeholder')
labels = tf.placeholder('float', [None, 1], name = 'labels_placeholder')
weights = {
'ol': tf.Variable(tf.random_normal([2, 1], stddev= -12), name = 'w_ol')
}
biases = {
'ol': tf.Variable(tf.random_normal([1], stddev=-12), name = 'b_ol')
}
ol = tf.nn.sigmoid(tf.add(tf.matmul(features, weights['ol']), biases['ol']), name = 'ol')
loss = -tf.reduce_sum(labels*tf.log(ol), name = 'loss') # cross entropy
train = tf.train.AdamOptimizer(learning_rate).minimize(loss)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
for epoch in range(n_epochs):
ptr = 0
for iteration in range(n_iteration):
epoch_x = data_f[ptr: ptr + batch_size]
epoch_y = data_l[ptr: ptr + batch_size]
ptr = ptr + batch_size
_, err = sess.run([train, loss], feed_dict={features: epoch_x, labels:epoch_y})
print("Loss @ epoch ", epoch, " = ", err)
print("Testing...\n")
data = pd.read_csv("E:/workspace_py/datasets/simdata/linear_data_eval.csv")
test_data_l = data.drop(["f1", "f2"], axis = 1)
test_data_f = data.drop("lbl", axis = 1)
#vvvHERE
print(sess.run(ol, feed_dict={features: test_data_f})) #<<<HERE
#^^^HERE
saver = tf.train.Saver()
saver.save(sess, save_path="E:/workspace_py/saved_models/meditation_disciminative_model.ckpt")
sess.close()
output:
2017-10-11 00:49:47.453721: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-10-11 00:49:47.454212: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-11 00:49:49.608862: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:955] Found device 0 with properties:
name: GeForce GTX 960M
major: 5 minor: 0 memoryClockRate (GHz) 1.176
pciBusID 0000:01:00.0
Total memory: 4.00GiB
Free memory: 3.35GiB
2017-10-11 00:49:49.609281: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:976] DMA: 0
2017-10-11 00:49:49.609464: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:986] 0: Y
2017-10-11 00:49:49.609659: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 960M, pci bus id: 0000:01:00.0)
Loss @ epoch 0 = 0.000135789
Loss @ epoch 1 = 4.16049e-05
Loss @ epoch 2 = 1.84776e-05
Loss @ epoch 3 = 9.41758e-06
Loss @ epoch 4 = 5.24522e-06
Loss @ epoch 5 = 2.98024e-06
Loss @ epoch 6 = 1.66893e-06
Loss @ epoch 7 = 1.07288e-06
Loss @ epoch 8 = 5.96047e-07
Loss @ epoch 9 = 3.57628e-07
Loss @ epoch 10 = 2.38419e-07
Loss @ epoch 11 = 1.19209e-07
Loss @ epoch 12 = 1.19209e-07
Loss @ epoch 13 = 1.19209e-07
Loss @ epoch 14 = -0.0
Loss @ epoch 15 = -0.0
Loss @ epoch 16 = -0.0
Loss @ epoch 17 = -0.0
Loss @ epoch 18 = -0.0
Loss @ epoch 19 = -0.0
Loss @ epoch 20 = -0.0
Loss @ epoch 21 = -0.0
Loss @ epoch 22 = -0.0
Loss @ epoch 23 = -0.0
Loss @ epoch 24 = -0.0
Loss @ epoch 25 = -0.0
Loss @ epoch 26 = -0.0
Loss @ epoch 27 = -0.0
Loss @ epoch 28 = -0.0
Loss @ epoch 29 = -0.0
Testing...
[[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]]
Saving model...
[Finished in 57.9s]
Upvotes: 1
Views: 357
Reputation: 66835
First of all this is not valid cross entropy loss. The equation you are using works only with 2 or more outputs. With a single sigmoid output you have to do
-tf.reduce_sum(labels*tf.log(ol) + (1-labels)*tf.log(1-ol), name = 'loss')
otherwise the optimal solution is to always answer "1" (which is happening right now).
Why?
Note that labels is only 0 or 1, and your whole loss is a multiplication of label and logarithm of the prediction. Consequently when true label is 0, your loss is 0 no matter your prediction, as 0 * log(x) = 0 no matter what is x (as long as log(x) is defined). Consequently your model is only penalised for not predicting "1" when it should, and so it learns to output 1 all the time.
You are providing negative stddev to normal distribution, while you should not (unless this is some undocumented feature of random_normal, but according to docs it should accept a single positive float, and you should provide a small number there).
Computing cross entropy like this (in a naive way) is not numericaly stable, take a look at tf.sigmoid_cross_entropy_with_logits.
You are not permuting your dataset, thus you always process data in the same order, which can have bad consequences (periodic increases in the loss, harder convergence or lack of convergence).
Upvotes: 2