Reputation: 1329
2 layers MLP (Relu) + Softmax
After 20 iterations, Tensor Flow just gives up and stops updating any weights or biases.
I initially thought that my ReLu where dying, so I displayed histograms to make sure none of them where 0. And none of them are !
They just stop changing after few iterations and cross entropy is still high. ReLu, Sigmoid and tanh gives the same results. Tweaking GradientDescentOptimizer from 0.01 to 0.5 also doesn't change much.
There has to be a bug somewhere. Like an actual bug in my code. I can't even overfit a small sample set !
Here are my histograms and here's my code, if anyone could check it out, that would be a major help.
We have 3000 scalars with 6 values between 0 and 255 to classify in two classes : [1,0] or [0,1] (I made sure to randomise the order)
def nn_layer(input_tensor, input_dim, output_dim, layer_name, act=tf.nn.relu):
with tf.name_scope(layer_name):
weights = tf.Variable(tf.truncated_normal([input_dim, output_dim], stddev=1.0 / math.sqrt(float(6))))
tf.summary.histogram('weights', weights)
biases = tf.Variable(tf.constant(0.4, shape=[output_dim]))
tf.summary.histogram('biases', biases)
preactivate = tf.matmul(input_tensor, weights) + biases
tf.summary.histogram('pre_activations', preactivate)
#act=tf.nn.relu
activations = act(preactivate, name='activation')
tf.summary.histogram('activations', activations)
return activations
#We have 3000 scalars with 6 values between 0 and 255 to classify in two classes
x = tf.placeholder(tf.float32, [None, 6])
y = tf.placeholder(tf.float32, [None, 2])
#After normalisation, input is between 0 and 1
normalised = tf.scalar_mul(1/255,x)
#Two layers
hidden1 = nn_layer(normalised, 6, 4, "hidden1")
hidden2 = nn_layer(hidden1, 4, 2, "hidden2")
#Finish by a softmax
softmax = tf.nn.softmax(hidden2)
#Defining loss, accuracy etc..
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=softmax))
tf.summary.scalar('cross_entropy', cross_entropy)
correct_prediction = tf.equal(tf.argmax(softmax, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
tf.summary.scalar('accuracy', accuracy)
#Init session and writers and misc
session = tf.Session()
train_writer = tf.summary.FileWriter('log', session.graph)
train_writer.add_graph(session.graph)
init= tf.global_variables_initializer()
session.run(init)
merged = tf.summary.merge_all()
#Train
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(cross_entropy)
batch_x, batch_y = self.trainData
for _ in range(1000):
session.run(train_step, {x: batch_x, y: batch_y})
#Every 10 steps, add to the summary
if _ % 10 == 0:
s = session.run(merged, {x: batch_x, y: batch_y})
train_writer.add_summary(s, _)
#Evaluate
evaluate_x, evaluate_y = self.evaluateData
print(session.run(accuracy, {x: batch_x, y: batch_y}))
print(session.run(accuracy, {x: evaluate_x, y: evaluate_y}))
Hidden Layer 1. Output isn't zero, so that's not a dying ReLu problem. but still, weights are constant! TF didn't even try to modify them
Same for Hidden Layer 2. TF tried tweaking them a bit and gave up pretty fast.
Cross entropy does decrease, but stays staggeringly high.
EDIT : LOTS of mistakes in my code. First one is 1/255 = 0 in python... Changed it to 1.0/255.0 and my code started to live.
So basically, my input was multiplied by 0 and the neural network just was purely blind. So he tried to get the best result he could while being blind and then gave up. Which explains totally it's reaction.
Now I was applying a softmax twice... Modifying it helped also. And by strying different learning rates and different number of epoch I finally found something good.
Here is the final working code :
def runModel(self):
def nn_layer(input_tensor, input_dim, output_dim, layer_name, act=tf.nn.relu):
with tf.name_scope(layer_name):
#This is standard weight for neural networks with ReLu.
#I divide by math.sqrt(float(6)) because my input has 6 values
weights = tf.Variable(tf.truncated_normal([input_dim, output_dim], stddev=1.0 / math.sqrt(float(6))))
tf.summary.histogram('weights', weights)
#I chose this bias myself. It work. Not sure why.
biases = tf.Variable(tf.constant(0.4, shape=[output_dim]))
tf.summary.histogram('biases', biases)
preactivate = tf.matmul(input_tensor, weights) + biases
tf.summary.histogram('pre_activations', preactivate)
#Some neurons will have ReLu as activation function
#Some won't have any activation functions
if act == "None":
activations = preactivate
else :
activations = act(preactivate, name='activation')
tf.summary.histogram('activations', activations)
return activations
#We have 3000 scalars with 6 values between 0 and 255 to classify in two classes
x = tf.placeholder(tf.float32, [None, 6])
y = tf.placeholder(tf.float32, [None, 2])
#After normalisation, input is between 0 and 1
#Normalising input really helps. Nothing is doable without it
#But my ERROR was to write 1/255. Becase in python
#1/255 = 0 .... (integer division)
#But 1.0/255.0 = 0,003921568 (float division)
normalised = tf.scalar_mul(1.0/255.0,x)
#Three layers total. The first one is just a matrix multiplication
input = nn_layer(normalised, 6, 4, "input", act="None")
#The second one has a ReLu after a matrix multiplication
hidden1 = nn_layer(input, 4, 4, "hidden", act=tf.nn.relu)
#The last one is also jsut a matrix multiplcation
#WARNING ! No softmax here ! Because later we call a function
#That implicitly does a softmax
#And it's bad practice to do two softmax one after the other
output = nn_layer(hidden1, 4, 2, "output", act="None")
#Tried different learning rates
#Higher learning rate means find a result faster
#But could be a local minimum
#Lower learning rate means we need much more epochs
learning_rate = 0.03
with tf.name_scope('learning_rate_'+str(learning_rate)):
#Defining loss, accuracy etc..
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=output))
tf.summary.scalar('cross_entropy', cross_entropy)
correct_prediction = tf.equal(tf.argmax(output, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
tf.summary.scalar('accuracy', accuracy)
#Init session and writers and misc
session = tf.Session()
train_writer = tf.summary.FileWriter('log', session.graph)
train_writer.add_graph(session.graph)
init= tf.global_variables_initializer()
session.run(init)
merged = tf.summary.merge_all()
#Train
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)
batch_x, batch_y = self.trainData
for _ in range(1000):
session.run(train_step, {x: batch_x, y: batch_y})
#Every 10 steps, add to the summary
if _ % 10 == 0:
s = session.run(merged, {x: batch_x, y: batch_y})
train_writer.add_summary(s, _)
#Evaluate
evaluate_x, evaluate_y = self.evaluateData
print(session.run(accuracy, {x: batch_x, y: batch_y}))
print(session.run(accuracy, {x: evaluate_x, y: evaluate_y}))
Upvotes: 3
Views: 664
Reputation: 135
Just incase someone needs it in the future:
I had initialized my dual layer network's layers with np.random.randn
but the network refused to learn. Using the He (for ReLU) and Xavier(for softmax) initializations totally worked.
Upvotes: 0
Reputation: 1009
I'm afraid that you have to reduce your learning rate. It's to high. High learning rate usually leads you to local minimum not global one.
Try 0.001, 0.0001 or even 0.00001. Or make your learning rate flexible.
I did not checked the code, so firstly try to tune LR.
Upvotes: 5