Slow tensorflow training and evaluation on GPU

Question

So I am performing some research and have lots of velocity and acceleration data of an object being moved by two people together around a room. Previously, I have successfully trained a time-series prediction neural net using LSTM and RNN to get a prediction of what the object's velocity will be one time step into the future.

After training this NN, I then augmented it to use the prediction, along with the previous data, to predict another time step into the future, and so on for a certain number of time steps. I've included a graphic of what this looks like, NN_explanation. Essentially, I use previous data (of size N time steps by M inputs) to predict one step, then add this prediction to the end of the inputs, and remove the first time step of the input (to keep the size N x M) and then train again for the next time step, until I have P number of future predictions to compare against the measured data.

Here are my variables

x = tf.placeholder(tf.float32,[None, n_steps, n_inputs])
y = tf.placeholder(tf.float32,[None, n_outputs])
W = {'hidden': tf.Variable(tf.random_normal([n_inputs, n_nodes])),
'output': tf.Variable(tf.random_normal([n_nodes,n_outputs]))}
bias = {'hidden': tf.Variable(tf.random_normal([n_nodes],mean= 1.0)),
'output': tf.Variable(tf.random_normal([n_outputs]))

Here is my model

def model(x,y,W,bias):
x = tf.transpose(x,[1,0,2])
x = tf.reshape(x,[-1,n_inputs])
x = tf.nn.relu(tf.matmul(x,W['hidden']) + bias['hidden'])
x = tf.split(x,n_steps,0)
cells = []
for _ in xrange(n_layers):
    lstm_cell = rnn.BasicLSTMCell(n_nodes, forget_bias = 1.0, state_is_tuple = True)
    cells.append(lstm_cell)
lstm_cells = rnn.MultiRNNCell(cells,state_is_tuple = True)
outputs,states = rnn.static_rnn(lstm_cells, x, dtype = tf.float32)
output = outputs[-1]
return tf.matmul(output, W['output') + bias['output']

So I have two questions:

1] When I am training this NN, I am using a TitanX GPU and it is taking longer than on my CPU. I read somewhere that this might be due to the nature of LSTM cells. Is this true? If so, is there any way I can make the training of this network go faster on my GPU, or am I just stuck with it being slow.

2] After training is completed, I want to run the prediction in real time with real data. Unfortunately, using sess.run(prediction,feed_dict) even one time takes 0.05 seconds. If I want to get more than just one future prediction step, let's say 10 future steps of prediciton, running a loop to get 10 predictions would then take 0.5 seconds, which is not realistic for my application. Is there a reason it is taking so long to evaluate? I have tried reducing the number of time steps (n_steps), as well as the number of future steps to predict, and that seems to reduce the amount of time it takes to predict. But I feel like that should only affect the training time, since at evaluation time, the NN has already trained everything and should simply be cramming numbers through the GPU. Any ideas?

Mark · Accepted Answer

Regarding question 1: Not all NN benefit from using GPUs. A GPU is good where there are a large number of multiplications which can be parallelized. Hence, they are very good at running convolutional neural networks. However, when it comes to RNNs most probably, the CPU is your best bet. If you have the resources, you could use Google Cloud ML-Engine and run it on a cluster of CPUs instead.

Regarding question 2: TensorFlow has large overhead when using sess.run(). However, i think in the last release they have introduced some functionality to convert the network into executables. A better opinion is required though about it.

Slow tensorflow training and evaluation on GPU

Answers (1)

Related Questions