Ismael
Ismael

Reputation: 753

tensorflow model has different results than the same model in skflow (optimizer)

I'm using tensorflow to replicate a neural network for the MNIST dataset, previously programmed in skflow. Here is the model in skflow:

import tensorflow.contrib.learn as skflow
from sklearn import metrics
from sklearn.datasets import fetch_mldata
from sklearn.cross_validation import train_test_split

mnist = fetch_mldata('MNIST original')

train_dataset, test_dataset, train_labels, test_labels = train_test_split( mnist.data, mnist.target, test_size=10000, random_state=42)

classifier = skflow.TensorFlowDNNClassifier(hidden_units=[1200, 1200], n_classes=10, optimizer="SGD", learning_rate=0.01, batch_size=128, steps=1000)
classifier.fit(train_dataset, train_labels)
score = metrics.accuracy_score(test_labels, classifier.predict(test_dataset))
print("Accuracy: %f" % score)

This model get 0.950600 of accuracy.

But the model replicated in tensorflow gets nan in the loss fuction and fails to improve (I think it's not related with Tensorflow NaN bug? since I'm using tf.nn.softmax_cross_entropy_with_logits).

I can't figure out why, since the setup of the model in tensorflow is the same than in the model in skflow. The only thing I'm unsure if it's the same, is on how skflow initializes the weights of the network, I searched that part in the code of skflow but I have not found it.

Here is the code in tensorflow:

import numpy as np
import tensorflow as tf
from sklearn.cross_validation import train_test_split
from sklearn.datasets import fetch_mldata

mnist = fetch_mldata('MNIST original')

num_labels = len(np.unique(mnist.target))
num_pixels = mnist.data.shape[1]

#reshape labels to one hot encoding
labels = (np.arange(num_labels) == mnist.target[:, None]).astype(np.float32)

#create train_dataset of 60000 and test_dataset of 10000 elem
train_dataset, test_dataset, train_labels, test_labels = train_test_split(mnist.data, labels, test_size=10000, random_state=42)


def accuracy(predictions, labels):
    return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1)) / predictions.shape[0])


batch_size = 128
graph = tf.Graph()
with graph.as_default():

    # Input data.
    tf_train_dataset = tf.placeholder(tf.float32,
                                  shape=(batch_size, num_pixels))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_test_dataset = tf.cast(tf.constant(test_dataset), tf.float32)

    w_hidden = tf.Variable(tf.truncated_normal([num_pixels, 1200]))
    b_hidden = tf.Variable(tf.zeros([1200]))
    hidden = tf.nn.relu(tf.matmul(tf_train_dataset, w_hidden) + b_hidden)

    w_hidden_2 = tf.Variable(tf.truncated_normal([1200, 1200]))
    b_hidden_2 = tf.Variable(tf.zeros([1200]))
    hidden2 = tf.nn.relu(tf.matmul(hidden, w_hidden_2) + b_hidden_2)

    w = tf.Variable(tf.truncated_normal([1200, num_labels]))
    b = tf.Variable(tf.zeros([num_labels]))
    logits = tf.matmul(hidden2, w) + b

    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
    logits, tf_train_labels))

    # Optimizer.
    optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(loss)

    # Predictions for the training, and test data.
    train_prediction = tf.nn.softmax(logits)
    test_prediction = tf.nn.softmax(tf.matmul(tf.nn.relu(tf.matmul(tf.nn.relu(tf.matmul(tf_test_dataset, w_hidden) + b_hidden), w_hidden_2) + b_hidden_2), w) + b)

num_steps = 1001

with tf.Session(graph=graph) as session:
    tf.initialize_all_variables().run()
    print("Initialized")
    for step in range(num_steps):
        # Pick an offset within the training data, which has been randomized.
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)

        # Generate a minibatch.
        batch_data = train_dataset[offset:(offset + batch_size), :]
        batch_labels = train_labels[offset:(offset + batch_size), :]

        # Prepare a dictionary telling the session where to feed the minibatch.
        feed_dict = {tf_train_dataset: batch_data, tf_train_labels: batch_labels}
        _, l, predictions = session.run( [optimizer, loss, train_prediction], feed_dict=feed_dict)
        if (step % 100 == 0):
            print("Minibatch loss at step %d: %f" % (step, l))
            print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
    print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))

I'm clueless on what might be the issue. Any suggestions?

Edited 1: As I was suggested, I tried replacing tf.Variable calls with tf.get_variable("w_hidden", [num_pixels, 1200]), but I got Nans.

Also, I used skflow.ops.dnn op for doing the layers and used my own loss and etc, and still got Nans.

Edited 2: Turns out it is not a problem of weight initialization. It seems that the gradients are too unstable (in the tensorflow model) and that lead the loss to become NaN. As in Adding multiple layers to TensorFlow causes loss function to become Nan, I slowed the learning rate by an order of magnitude, and it worked out.

Now what I don't understand is what differs between the SGD optimizer of skflow and the one above. Or what is the explanation that they "seem" equal, but they need different learning rates?

Upvotes: 2

Views: 873

Answers (1)

ilblackdragon
ilblackdragon

Reputation: 1835

Initialization in skflow relies on tf.get_variable default initialization - uniform_unit_scaling_initializer (see this for detailed description).

You can try replacing your tf.Variable calls with something like tf.get_variable("w_hidden", [num_pixels, 1200]).

Alternative, is to start with using skflow.ops.dnn op that will do the layers for you but you still do your own loss and etc.

Also please let me know if you there a clear usecase that forced you to rewrite things in pure TensorFlow instead of using skflow - I would love to address it. You can always write custom model via passing model_fn into TensorFlowEstimator and still use training / batching / saving and etc functionality.

Upvotes: 0

Related Questions