Reputation: 61
So, I'm trying to learn tensorflow and, for that, I try to create a classifier for something that, I think, is not so hard. I'd like to predict if a number is odd or even. The problem is that Tensorflow always predict the same output, I searched answers the last days but nothing helped me... I saw the following answers : -Tensorflow predicts always the same result
-TensorFlow always converging to same output for all items after training
-TensorFlow always return same result
Here's my code:
in:
df
nb y1
0 1 0
1 2 1
2 3 0
3 4 1
4 5 0
...
19 20 1
inputX = df.loc[:, ['nb']].as_matrix()
inputY = df.loc[:, ['y1']].as_matrix()
print(inputX.shape)
print(inputY.shape)
out:
(20, 1) (20, 1)
in:
# Parameters
learning_rate = 0.00000001
training_epochs = 2000
display_step = 50
n_samples = inputY.size
x = tf.placeholder(tf.float32, [None, 1])
W = tf.Variable(tf.zeros([1, 1]))
b = tf.Variable(tf.zeros([1]))
y_values = tf.add(tf.matmul(x, W), b)
y = tf.nn.relu(y_values)
y_ = tf.placeholder(tf.float32, [None,1])
# Cost function: Mean squared error
cost = tf.reduce_sum(tf.pow(y_ - y, 2))/(2*n_samples)
# Gradient descent
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
# Initialize variabls and tensorflow session
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
for i in range(training_epochs):
sess.run(optimizer, feed_dict={x: inputX, y_: inputY}) # Take a gradient descent step using our inputs and labels
# Display logs per epoch step
if (i) % display_step == 0:
cc = sess.run(cost, feed_dict={x: inputX, y_:inputY})
print("Training step:", '%04d' % (i), "cost=", "{:.9f}".format(cc)) #, \"W=", sess.run(W), "b=", sess.run(b)
print ("Optimization Finished!")
training_cost = sess.run(cost, feed_dict={x: inputX, y_: inputY})
print ("Training cost=", training_cost, "W=", sess.run(W), "b=", sess.run(b), '\n')
out:
Training step: 0000 cost= 0.250000000
Training step: 0050 cost= 0.250000000
Training step: 0100 cost= 0.250000000
...
Training step: 1800 cost= 0.250000000
Training step: 1850 cost= 0.250000000
Training step: 1900 cost= 0.250000000
Training step: 1950 cost= 0.250000000
Optimization Finished!
Training cost= 0.25 W= [[ 0.]] b= [ 0.]
in:
sess.run(y, feed_dict={x: inputX })
out:
array([[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.]], dtype=float32)
I tried to play with my Hyper parameters like, the learning rate or the number of training epochs. I changed the activation function from softmax to relu. I changed my dataframe to have more examples but nothing happened. I also tried to add random for my Weights, but nothing changed, the cost was just starting to a higher value.
Upvotes: 3
Views: 3609
Reputation: 153
first of all I have to admit that I never used tensorflow. But I think you have a modelling problem here.
You are using the simplest network architecture possible (a 1-dimensional perceptron). You have two variables (w and b) which you want to learn and your decision rule for the output looks like
if you subtract the b and divide by w you get
So you are basically looking for a threshold to seperate odd and even numbers. No matter how you choose w and b you will always misclassify half of the numbers.
Although decinding if a number is odd or even seems to be a super trivial task for us humans it is not for a single perceptron.
Upvotes: 2
Reputation: 371
The main problem that I see is that you initialize your weights in the W matrix with 0s. The operation that you have in the linear layer is basically Wx + b. Hence the gradient with respect to x is W. If you start now with zeros for W then the gradient is 0 as well and you are not able to learn anything. Try to use random initial values as stated on tensorflow.org
# Create two variables.
weights = tf.Variable(tf.random_normal([784, 200], stddev=0.35),
name="weights")
biases = tf.Variable(tf.zeros([200]), name="biases")
Upvotes: 3
Reputation: 1114
From giving a quick look at the code, it looks ok to me (maybe a part initializing the weights to zero, usually you want a small number different from zero to avoid a trivial solution), while I don't think that you can fit the problem of the parity of integers with a linear regression.
The point is that you are trying to fit
x % 2
with predictions of the form
activation(x * w + b)
and there is no way to find good w
and b
to solve this problem.
Another way to understand this is to plot your data: the scatter plot of the parity of x
are two lines of points, and the only way to fit them with a line is with a flat line (that will have a high cost anyway).
I think it would be better to change data to start with, but if you want to address this problem, you should obtain some result using a sine or a cosine as activation function.
Upvotes: 3