Mik3l
Mik3l

Reputation: 27

Relu function, return 0 and big numbers

Hi I'm new in neuralNetworks with tensorflow. I've taken a small fraction of the spaces365 dataset. I want to make a neural network to classify betweeen 10 places.

For that I've tried to do a small copy of a vgg network. The problem I have is that at the output of the softmax function I get a one-hot encoded array. Looking for problems in my code, I've realised that the output of relu functions are either 0 or a big number (around 10000).

I don't know where I'm wrong. Here it's my code:

def variables(shape):
    return tf.Variable(2*tf.random_uniform(shape,seed=1)-1)

def layerConv(x,filter):
    return tf.nn.conv2d(x,filter, strides=[1, 1, 1, 1], padding='SAME') 
def maxpool(x):
    return tf.nn.max_pool(x,[1,2,2,1],[1,2,2,1],padding='SAME')

weights0 = variables([3,3,1,16])

l0 = tf.nn.relu(layerConv(input,weights0))
l0 = maxpool(l0)

weights1 = variables([3,3,16,32])
l1 = tf.nn.relu(layerConv(l0,weights1))
l1 = maxpool(l1)

weights2 = variables([3,3,32,64])
l2 = tf.nn.relu(layerConv(l1,weights2))
l2 = maxpool(l2)

l3 = tf.reshape(l2,[-1,64*32*32])

syn0 = variables([64*32*32,1024])
bias0 =  variables([1024])
l4 = tf.nn.relu(tf.matmul(l3,syn0) + bias0)
l4 = tf.layers.dropout(inputs=l4, rate=0.4)

syn1 = variables([1024,10])
bias1 = variables([10])
output_pred = tf.nn.softmax(tf.matmul(l4,syn1) + bias1)

error = tf.square(tf.subtract(output_pred,output),name='error')
loss = tf.reduce_sum(error, name='cost')

#TRAINING

optimizer = tf.train.GradientDescentOptimizer(learning_rate)
train = optimizer.minimize(loss)

The input of the neural netWork is a normalized grayscale image of 256*256 pixels. The learning Rate is 0.1 and the Batch Size is 32.

Thank you in advance!!

Upvotes: 1

Views: 831

Answers (2)

Thomas Pinetz
Thomas Pinetz

Reputation: 7148

The problem you have is your weight initialization. NN are highly complicated non-convex optimization problems. Therefore, a good init is paramount to getting any good results. If you use ReLUs you should use the Initialization proposed by He et al. (https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/He_Delving_Deep_into_ICCV_2015_paper.pdf?spm=5176.100239.blogcont55892.28.pm8zm1&file=He_Delving_Deep_into_ICCV_2015_paper.pdf).

In Essence the initialization of your network should be initialized with iid gaussian distributed values with mean 0 and standard deviation as follows:

stddev = sqrt(2 / Nr_input_neurons)

Upvotes: 1

pissall
pissall

Reputation: 7399

Essentially what reLu is :

def relu(vector):
    vector[vector < 0] = 0
    return vector

and softmax:

def softmax(x):
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum(axis=0)

The output of softmax being a one-hot encoded array means there is a problem and that could be many things.

You can try reducing the learning_rate for starters, you can use 1e-4 / 1e-3 and check. If it doesn't work, try adding some regularization. I am also skeptical about your weight initialization.

Regulatization : This is a form of regression, that constrains/ regularizes or shrinks the coefficient estimates towards zero. In other words, this technique discourages learning a more complex or flexible model, so as to avoid the risk of overfitting. - Regularization in ML

Link to : Build a multilayer neural network with L2 regularization in tensorflow

Upvotes: 2

Related Questions