Reputation: 588
I am building a neural network on Keras, including multiple layers of LSTM, Permute and Dense.
It seems LSTM is GPU-unfriendly. So I did research and use
With tf.device('/cpu:0'):
out = LSTM(cells)(inp)
But based on my understanding about with
, with
is try...finally
block to ensure that clean-up code is executed. I don't know whether the following CPU/GPU mixture usage code works or not? Will they accelerate speed of training?
With tf.device('/cpu:0'):
out = LSTM(cells)(inp)
With tf.device('/gpu:0'):
out = Permute(some_shape)(out)
With tf.device('/cpu:0'):
out = LSTM(cells)(out)
With tf.device('/gpu:0'):
out = Dense(output_size)(out)
Upvotes: 3
Views: 3497
Reputation: 93
I have created a model using 2 LSTM and 1 dense layers and trained it in my GPU (NVidia GTX 10150Ti) Here is my observations.
here is some sample snippet
model = keras.Sequential()
model.add(keras.layers.cudnn_recurrent.CuDNNLSTM(neurons
, batch_input_shape=(nbatch_size, reshapedX.shape[1], reshapedX.shape[2])
, return_sequences=True
, stateful=True))
Upvotes: 0
Reputation: 40506
As you may read here - tf.device
is a context manager which switches a default device to this passed as its argument in a context (block) created by it. So this code should run all '/cpu:0'
device at CPU
and rest on GPU
.
The question will it speed up your training is really hard to answer because it depends on the machine you use - but I don't expect computations to be faster as each change of a device makes data to be copied between GPU RAM
and machine RAM
. This could even slow down your computations.
Upvotes: 2