Reputation: 115
I am following Tom Hope's book to learn Tensorflow and I arrived upon this Logistic Regression Example:
import tensorflow as tf
import numpy as np
N = 20000
def sigmoid(x):
return 1 / (1 + np.exp(-x))
# === Create data and simulate results =====
x_data = np.random.randn(N,3)
w_real = [0.3,0.5,0.1]
b_real = -0.2
wxb = np.matmul(w_real,x_data.T) + b_real
y_data_pre_noise = sigmoid(wxb)
y_data = np.random.binomial(1,y_data_pre_noise)
NUM_STEPS = 50
g1 = tf.Graph()
wb_ = []
with g1.as_default():
x = tf.placeholder(tf.float32,shape=[None,3])
y_true = tf.placeholder(tf.float32,shape=None)
with tf.name_scope('inference') as scope:
w = tf.Variable([[0,0,0]],dtype=tf.float32,name='weights')
b = tf.Variable(0,dtype=tf.float32,name='bias')
y_pred = tf.matmul(w,tf.transpose(x)) + b
with tf.name_scope('loss') as scope:
loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=y_true,logits=y_pred))
with tf.name_scope('train') as scope:
learning_rate = 0.5
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
train = optimizer.minimize(loss)
# Before starting, initialize the variables. We will 'run' this first.
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for step in range(NUM_STEPS):
sess.run(train,{x: x_data, y_true: y_data})
if (step % 5 == 0):
print(step, sess.run([w,b]))
wb_.append(sess.run([w,b]))
print(50, sess.run([w,b]))
The example runs as described in the book, however, there is one thing I do not understand, why has the author not used tf.sigmoid()
for the inference?
with tf.name_scope('inference') as scope:
w = tf.Variable([[0,0,0]],dtype=tf.float32,name='weights')
b = tf.Variable(0,dtype=tf.float32,name='bias')
***y_pred = tf.sigmoid(tf.matmul(w,tf.transpose(x)) + b)***
Is there something obvious that I have overlooked? Also the results look largely similar with and without the said modification.
Upvotes: 2
Views: 149
Reputation: 53758
Because the loss function tf.nn.sigmoid_cross_entropy_with_logits
is applying the sigmoid itself. Doing it twice in training will skew the prediction towards the center: the network will not be able to output a value close to 0
or 1
. This question is an example of this network.
If one is interested in probability inference, they can add a separate op, e.g.:
y_proba = tf.sigmoid(y_pred)
... but still use the raw y_pred
in training (and y_proba
in test time).
Upvotes: 1