Reputation: 355
I'm trying to build a neural network in tensorflow to learn the library better, and my loss value is not changing. This is my code:
import tensorflow as tf
import numpy as np
import pandas as pd
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
all_data = pd.read_csv('/projects/data/testfile.csv')
all_data = all_data.values
size_layer1 = 1
size_layer2 = 10
size_layer3 = 1
labels = all_data[:, 9]; labels = tf.convert_to_tensor(labels, np.float32); labels = tf.reshape(labels, [985, 1])
data = all_data[:, 6]; data = tf.convert_to_tensor(data, np.float32)
theta1 = tf.Variable(tf.zeros([size_layer2, size_layer1])); theta1 = tf.reshape(theta1, [10, 1])
theta2 = tf.Variable(tf.zeros([size_layer3, size_layer2])); theta2 = tf.reshape(theta2, [1, 10])
a1 = data; a1 = tf.reshape(a1, [1, 985])
z2 = tf.matmul(theta1, a1)
a2 = tf.nn.relu(z2)
z3 = tf.matmul(theta2, a2)
a3 = tf.nn.sigmoid(z3)
h = tf.transpose(a3)
cost = tf.losses.mean_squared_error(labels, h)
train = tf.train.GradientDescentOptimizer(0.01).minimize(cost)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for i in range(10):
sess.run(train)
print(sess.run(cost))
My entire dataset is 985x12, but most of the columns are text, so I isolated two columns. I know that a neural network should not be used like this, with a 1:10:1 node system and with real-numbered labels, but I'm not trying to optimize the network, just learn the language. And I know that I should be using feature scaling/mean normalization, but as I said, I'm not really trying to optimize the neural net perfectly. This is my output:
73948990000.0
73948990000.0
73948990000.0
73948990000.0
73948990000.0
73948990000.0
73948990000.0
73948990000.0
73948990000.0
73948990000.0
I've tried a lot of things. Originally, my cost function was ordinary cross-entropy, but since my data was real-number valued, I changed it to mean squared error. I also tried changing the optimizer, and it didn't change anything. Is the problem that I'm not trying to design the network well and I'm using a bad architecture, or is it something else?
Upvotes: 0
Views: 68
Reputation: 2419
The initial weights theta1
and theta2
are arrays of zeros, which can't be used for training. The weights are used in calculating the delta values that update the weights during training, and this will zero out the deltas so the weights won't change. Also, if all weights are the same value (other than zero), they will have the same deltas and that prevents learning as well. Thus, the initial weights need to be random numbers.
Try using this for initializing randomized weights:
theta1 = tf.get_variable('theta1', shape=(size_layer2, size_layer1), initializer=tf.contrib.layers.xavier_initializer())
theta2 = tf.get_variable('theta2', shape=(size_layer3, size_layer2), initializer=tf.contrib.layers.xavier_initializer())
Upvotes: 1