Reputation: 48456
I've been experimenting with simple basic (intro tutorial-level) neural networks in various frameworks, but am confused about the performance I'm seeing in TensorFlow.
For example, the simple network from Michael Nielsen's tutorial (MNIST digit recognition using L2 stochastic gradient descent in a network with 30 hidden nodes) performs much worse (takes about 8x as long per epoch, with all the same parameters) than a slightly adapted (using vectorization by mini-batch as suggested in one of the tutorial exercises) version of Nielsen's basic NumPy code.
Does TensorFlow, running on a single CPU, always perform this badly? Are there settings I should tweak to improve performance? Or does TensorFlow only really shine with much more complex networks or learning regimes, so that it is not expected to do well for such simple toy cases?
from __future__ import (absolute_import, print_function, division, unicode_literals)
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import time
def weight_variable(shape):
return tf.Variable(tf.truncated_normal(shape, stddev=0.1))
def bias_variable(shape):
return tf.Variable(tf.constant(0.1, shape=shape))
mnist = input_data.read_data_sets("./data/", one_hot=True)
sess = tf.Session()
# Inputs and outputs
x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])
# Model parameters
W1 = weight_variable([784, 30])
b1 = bias_variable([30])
o1 = tf.nn.sigmoid(tf.matmul(x, W1) + b1, name='o1')
W2 = weight_variable([30, 10])
b2 = bias_variable([10])
y = tf.nn.softmax(tf.matmul(o1, W2) + b2, name='y')
sess.run(tf.initialize_all_variables())
loss = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
loss += 0.1/1000 * (tf.nn.l2_loss(W1) + tf.nn.l2_loss(W2))
train_step = tf.train.GradientDescentOptimizer(0.15).minimize(loss)
accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1)), tf.float32))
for ep in range(30):
for mb in range(int(len(mnist.train.images)/40)):
batch_xs, batch_ys = mnist.train.next_batch(40)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
Upvotes: 1
Views: 234
Reputation: 48320
Yes, I would expect that hand coded specialized simple networks running on CPU would run faster than tensorflow ones. The reason is usually connected to the graph evaluation system that tensorflow uses.
The benefit of using tensorflow is when you have much more complex algorithms and you want to be able to test for the correctness first and then be able to easily port it to use more machines and more processing units.
For example one thing you can try is to run your code on a machine that has a GPU and see that without changing anything in your code you would get a speed up, maybe faster than the hand coded example you linked. You can see that the hand written code would require considerable effort to be ported to GPU.
Upvotes: 1