Hood Khizer
Hood Khizer

Reputation: 115

Sigmoid in logistic regression (Tom Hope's Guide to Building Deep Learning Systems)

I am following Tom Hope's book to learn Tensorflow and I arrived upon this Logistic Regression Example:

import tensorflow as tf
import numpy as np
N = 20000
def sigmoid(x):
    return 1 / (1 + np.exp(-x))
# === Create data and simulate results =====
x_data = np.random.randn(N,3)
w_real = [0.3,0.5,0.1]
b_real = -0.2
wxb = np.matmul(w_real,x_data.T) + b_real
y_data_pre_noise = sigmoid(wxb)
y_data = np.random.binomial(1,y_data_pre_noise)

NUM_STEPS = 50
g1 = tf.Graph()
wb_ = []
with g1.as_default():
    x = tf.placeholder(tf.float32,shape=[None,3])
    y_true = tf.placeholder(tf.float32,shape=None)
    with tf.name_scope('inference') as scope:
        w = tf.Variable([[0,0,0]],dtype=tf.float32,name='weights')
        b = tf.Variable(0,dtype=tf.float32,name='bias')
        y_pred = tf.matmul(w,tf.transpose(x)) + b
    with tf.name_scope('loss') as scope:
        loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=y_true,logits=y_pred))
    with tf.name_scope('train') as scope:
        learning_rate = 0.5
        optimizer = tf.train.GradientDescentOptimizer(learning_rate)
        train = optimizer.minimize(loss)
    # Before starting, initialize the variables. We will 'run' this first.
    init = tf.global_variables_initializer()
    with tf.Session() as sess:
        sess.run(init)
        for step in range(NUM_STEPS):
            sess.run(train,{x: x_data, y_true: y_data})
            if (step % 5 == 0):
                print(step, sess.run([w,b]))
                wb_.append(sess.run([w,b]))
        print(50, sess.run([w,b]))

The example runs as described in the book, however, there is one thing I do not understand, why has the author not used tf.sigmoid() for the inference?

with tf.name_scope('inference') as scope:
        w = tf.Variable([[0,0,0]],dtype=tf.float32,name='weights')
        b = tf.Variable(0,dtype=tf.float32,name='bias')
        ***y_pred = tf.sigmoid(tf.matmul(w,tf.transpose(x)) + b)***

Is there something obvious that I have overlooked? Also the results look largely similar with and without the said modification.

Upvotes: 2

Views: 149

Answers (1)

Maxim
Maxim

Reputation: 53758

Because the loss function tf.nn.sigmoid_cross_entropy_with_logits is applying the sigmoid itself. Doing it twice in training will skew the prediction towards the center: the network will not be able to output a value close to 0 or 1. This question is an example of this network.

If one is interested in probability inference, they can add a separate op, e.g.:

y_proba = tf.sigmoid(y_pred)

... but still use the raw y_pred in training (and y_proba in test time).

Upvotes: 1

Related Questions