Poor Performance of Tensorflow 2 Keras Model with Variable-Length Training Data

Question

I'm using Tensorflow 2.2.0-gpu, and I have a simple Keras model that's composed of a few dense layers and a linear output (reference the code below). I'm training the model on variable-length samples, and when I run the code I get warnings about tf.function retracing. From what I've read, function tracing is expensive, and consequently the performance is poor. Here's the code, which takes about 330 seconds to run on my machine.

#import tensorflow as tf
#tf.compat.v1.disable_eager_execution()

import numpy as np
import timeit
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras import optimizers

def main():
  state_input = keras.Input((2,))
  hidden1     = layers.Dense(units = 64, activation = "relu")(state_input)
  hidden2     = layers.Dense(units = 128, activation = "relu")(hidden1)
  hidden3     = layers.Dense(units = 128, activation = "relu")(hidden2)
  output      = layers.Dense(units = 2, activation = "linear")(hidden3)

  model = keras.Model(inputs = state_input, outputs = output)
  opt   = optimizers.Adam(lr = 1e-4)

  model.compile(optimizer = opt, loss = "mean_squared_error")

  np.random.seed(0)

  def train():
    for i in range(2000):
      print(i)

      num_samples = np.random.randint(int(1e4), int(1e5))
      x = np.random.rand(num_samples, 2)
      y = np.random.rand(num_samples, 2)

      model.train_on_batch(x, y)

  print(timeit.timeit(train, number=1))

if __name__ == "__main__":
  main()

If I disable eager execution using tf.compat.v1.disable_eager_execution() (line 2 in the code), then the same code runs in about 30 seconds. This is similar to the performance I was seeing under Tensorflow 1.

Is there a way I can change my model such that I get similar performance to that attained with eager execution diabled? Namely, can the model be changed such that the function retracing isn't incurred on each call?

For reference, this is the warning that's generated when train_on_batch is called:

WARNING:tensorflow:10 out of the last 11 calls to .train_function at 0x7f68f3724158> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.

benbotto · Accepted Answer

I was able to improve the performance without disabling eager mode by using a tf.function with a signature, and applying gradients manually. (Reference Tensorflow's Better performance with tf.function article.) This improves the performance significantly, but the performance is still better when eager execution is outright disabled.

import tensorflow as tf
import numpy as np
import timeit
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras import optimizers
from tensorflow.keras import losses

def main():
  state_input = keras.Input((2,))
  hidden1     = layers.Dense(units = 64, activation = "relu")(state_input)
  hidden2     = layers.Dense(units = 128, activation = "relu")(hidden1)
  hidden3     = layers.Dense(units = 128, activation = "relu")(hidden2)
  output      = layers.Dense(units = 2, activation = "linear")(hidden3)

  model = keras.Model(inputs = state_input, outputs = output)
  opt   = optimizers.Adam(lr = 1e-4)
  loss  = losses.MeanSquaredError()

  np.random.seed(0)

  @tf.function(input_signature=[
    tf.TensorSpec(shape=(None, 2), dtype=tf.float32),
    tf.TensorSpec(shape=(None, 2), dtype=tf.float32)
  ])
  def fit(x, y):
    with tf.GradientTape() as tape:
      preds = model(x)
      losses = loss(preds, y)
    grad = tape.gradient(losses, model.trainable_variables)
    opt.apply_gradients(zip(grad, model.trainable_variables))

  def train():
    for i in range(2000):
      print(i)

      num_samples = np.random.randint(int(1e4), int(1e5))
      x = np.random.rand(num_samples, 2)
      y = x * 2

      fit(x, y)

  print(timeit.timeit(train, number=1))

  print('test')
  print(model.predict(np.array([[.2, .4], [.6, .8]])))

if __name__ == "__main__":
  main()

But honestly, that's pretty ugly.

Here's a great question about why TF2 is so slow as compared to TF1: Why is TensorFlow 2 much slower than TensorFlow 1? That gives some benchmarks.

My actual code, which is markedly more complex than the simple snippet provided in the question, is about 1/10th of the speed with eager execution enabled (the default). While using tf.function with a signature does speed the code up, it's still not nearly as fast as simply disabling eager execution (plus, again, using tf.function and GradientTape is pretty atrocious).

In the end I just disable eager execution. If someone comes along with a better answer then I'll gladly accept it.

Poor Performance of Tensorflow 2 Keras Model with Variable-Length Training Data

Answers (1)

Related Questions