Reputation: 45
I am building a simple one-hidden-layer neural network with Tensorflow.
For the inputs, every row of data corresponds to 10 answers. The first 2 elements of each row are correct, i.e. the same to the ground truth labels. In contrast, the last 8 elements are opposite to the ground truth label.
For example,
[1, 1, 0, 0, 0, 0, 0, 0, 0, 0], correct is 1
[0, 0, 1, 1, 1, 1, 1, 1, 1, 1], correct is 0
[0, 0, 1, 1, 1, 1, 1, 1, 1, 1], correct is 0
[1, 1, 0, 0, 0, 0, 0, 0, 0, 0], correct is 1
I would like my neural network to learn that the first two elements/features always give the correct results. Therefore, I wish the network to give bigger weights to the first two features. However, the network will always get stuck at some loss value.
More interestingly, the accuracy is taken as the proportion of labels that are correctly predicted out of the total number of labels. The loss function is calculated with the sigmoid function, i.e., $y * log(logit) + (1-y) * log(1-logit))$. Sometimes, as the loss decreased, the accuracy increased. e.g.,
epoch is: 0 loss is: 7.661093 accuracy value is: 1.0
epoch is: 100 loss is: 7.579134 accuracy value is: 0.54545456
epoch is: 200 loss is: 7.5791006 accuracy value is: 0.54545456
I thought the network could keep increasing the weights of the first two elements until it can completely predict the correct label.
Can anyone please tell me what should I do to facilitate the network correctly predicting the label, instead of getting stuck?
My code is here:
import tensorflow as tf
import numpy as np
class SigmoidNeuralNetwork():
def __init__(self, learning_rate, training_data, correct_labels, epoch_number):
self.learning_rate = learning_rate
self.training_data = training_data
self.correct_labels = correct_labels
self.X = tf.placeholder(tf.float32)
self.y = tf.placeholder(tf.float32)
self.feature_num = len(self.training_data[0])
self.sample_num = len(self.training_data)
self.W = tf.Variable(tf.random_uniform([self.feature_num, 1], -1.0, 1.0), dtype=tf.float32)
self.b = tf.Variable([0.0])
self.epoch_number = epoch_number
def launch_network(self):
db = tf.matmul(self.X, tf.reshape(self.W, [-1, 1])) + self.b
hyp = tf.sigmoid(db)
cost0 = self.y * tf.log(tf.clip_by_value(hyp, 1e-10, 1.0))
cost1 = (1 - self.y) * tf.log(tf.clip_by_value((1 - hyp), 1e-10, 1.0))
cost = (cost0 + cost1) / float(self.sample_num)
loss = -tf.reduce_sum(cost)
optimizer = tf.train.GradientDescentOptimizer(learning_rate=self.learning_rate)
train = optimizer.minimize(loss)
#
new_train_X = self.training_data.astype(np.float32)
output = tf.add(tf.matmul(new_train_X, self.W), self.b)
prediction = tf.sigmoid(output)
predicted_class = tf.greater(prediction, 0.5)
ground_labels = tf.reshape(tf.equal(self.y, 1.0), predicted_class.shape)
correct = tf.equal(predicted_class, ground_labels)
accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
#
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
for epoch in range(self.epoch_number):
_, loss_val, accuracy_val = sess.run([train, loss, accuracy], {self.X: self.training_data, self.y: self.correct_labels})
if epoch % 100 == 0:
print "epoch is: ", epoch, "loss is: ", loss_val, " accuracy value is: ", accuracy_val
# print "weight is: ", sess.run(self.W).flatten()
train_data = np.array([
[1, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 1, 1, 1, 1, 1],
[0, 0, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 1, 1, 1, 1, 1],
[0, 0, 1, 1, 1, 1, 1, 1, 1, 1],
[0, 0, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 0, 0, 0, 0, 0, 0, 0, 0]
])
correct_answers = np.array([1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1])
sigmoid_network = SigmoidNeuralNetwork(learning_rate=0.01, training_data=train_data, correct_labels=correct_answers,
epoch_number=10000)
sigmoid_network.launch_network()
Upvotes: 1
Views: 749
Reputation: 578
OP wrote:
I thought the network could keep increasing the weights of the first two elements until it can completely predict the correct label.
You are completely right.
Can anyone please tell me what should I do to facilitate the network correctly predicting the label, instead of getting stuck?
The problem is in the function launch_network()
:
def launch_network(self):
db = tf.matmul(self.X, tf.reshape(self.W, [-1, 1])) + self.b
hyp = tf.sigmoid(db)
cost0 = self.y * tf.log(tf.clip_by_value(hyp, 1e-10, 1.0))
... (skip) ...
Note that db
and hyp
have the same shape (self.sample_num, 1)
(2-dim), but the shape of self.y
(that is correct_answers
) is (self.sample_num,)
(1-dim).
At the 5th line to get cost0
, you multiplied as self.y * tf.log(...hyp...)
. So the shape of the result became (self.sample_num, self.sample_num)
, not (self.sample_num, 1)
.
The simplest solution is changing the shape of correct_answers
into (self.sample_num, 1)
(2-dim), not (self.sample_num,)
(1-dim) as follows:
correct_answers = np.array([1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1])[:,np.newaxis]
Upvotes: 1