lhlmgr
lhlmgr

Reputation: 2187

Eager-Mode very slow (~22x slower than Graph-Mode)

I read that Tensorflow 2.0 will have some major changes, and a big part will be eager-execution [1], so I tried to play a bit around with the eager-mode of tensorflow.

I took a code from a github-repo and tried to run it in eager-mode (however, without usage of Keras-Model/Layers as proposed). It turned out, that its quite slow. So I tried different modifications and compared it with the original source (graph-mode) of the model. The result is, that the graph-mode is around 22x times faster than the eager-mode. Its total clear to me, that the graph mode is faster, but by this number?

Is this always the case or do I need some special modifications / configurations of the variables to get a comparable performance to graph mode?

The source code, for both variants, can be found at [2].

Thanks in advance!

Eager-Mode:

# With 
#  with tf.device("/gpu:0"):
#    ...
#
# Runtime is 0.35395
# Runtime is 0.12711
# Runtime is 0.12438
# Runtime is 0.12428
# Runtime is 0.12572
# Runtime is 0.12593
# Runtime is 0.12505
# Runtime is 0.12527
# Runtime is 0.12418
# Runtime is 0.12340

Graph Mode:

# Runtime is 0.81241
# Runtime is 0.00573
# Runtime is 0.00573
# Runtime is 0.00570
# Runtime is 0.00555
# Runtime is 0.00564
# Runtime is 0.00545
# Runtime is 0.00540
# Runtime is 0.00591
# Runtime is 0.00574

[1] https://groups.google.com/a/tensorflow.org/forum/#!topic/developers/JHDpgRyFVUs

[2] https://gist.github.com/lhlmgr/f6709e5aba4a5314b5221d58232b09bd

Upvotes: 3

Views: 3294

Answers (1)

ash
ash

Reputation: 6751

Using eager execution may mean undoing some habits developed with TensorFlow graphs since code snippets that used to run once (e.g., Python function that constructs the graph to compute the loss) will run repeatedly (the same Python function will now compute the loss on each iteration).

I took a cursory look at code links provided and noticed some easy wins that would probably also be seen by using standard Python profiling tools. You may want use those (cProfile, pyspy etc.)

For example, the Keras network is currently implemented as:

class NFModel(tf.keras.Model):
  def __init__(self, *args, **kwargs):
    super().__init__(*args, **kwargs)

  def call(self, *args, **kwargs):
    num_layers = 6
    d, r = 2, 2
    bijectors = []

    for i in range(num_layers):
      with tf.variable_scope('bijector_%d' % i):
        V = tf.get_variable('V', [d, r], dtype=DTYPE)  # factor loading
        shift = tf.get_variable('shift', [d], dtype=DTYPE)  # affine shift
        L = tf.get_variable('L', [d * (d + 1) / 2], dtype=DTYPE)  # lower triangular
        bijectors.append(tfb.Affine(
          scale_tril=tfd.fill_triangular(L),
          scale_perturb_factor=V,
          shift=shift,
        ))

        alpha = tf.get_variable('alpha', [], dtype=DTYPE)
        abs_alpha = tf.abs(alpha) + .01
        bijectors.append(LeakyReLU(alpha=abs_alpha))

    base_dist = tfd.MultivariateNormalDiag(loc=tf.zeros([2], DTYPE))
    mlp_bijector = tfb.Chain(list(reversed(bijectors[:-1])), name='2d_mlp_bijector')
    dist = tfd.TransformedDistribution(distribution=base_dist, bijector=mlp_bijector)

Instead, if you create the variables in __init__ once and avoid tf.get_variable calls on every call to the network, you should see a big improvement.

class NFModel(tf.keras.Model):
  def __init__(self, *args, **kwargs):
    super(NFModel, self).__init__(*args, **kwargs)
    num_layers = 6
    d, r = 2, 2
    self.num_layers = num_layers
    self.V = [tf.get_variable('V', [d, r], dtype=DTYPE)  for _ in range(num_layers)]
    self.shift = [tf.get_variable('shift', [d], dtype=DTYPE)   for _ in range(num_layers)]
    self.L = [tf.get_variable('L', [d * (d + 1) / 2], dtype=DTYPE)  for _ in range(num_layers)]
    self.alpha = [tf.get_variable('alpha', [], dtype=DTYPE) for _ in range(num_layers)]


  def call(self, *args, **kwargs):
    bijectors = []

    for i in range(self.num_layers):
      V = self.V[i]
      shift = self.shift[i]
      L = self.L[i]
      bijectors.append(tfb.Affine(
        scale_tril=tfd.fill_triangular(L),
        scale_perturb_factor=V,
        shift=shift,
      ))

      alpha = self.alpha[i]
      abs_alpha = tf.abs(alpha) + .01
      bijectors.append(LeakyReLU(alpha=abs_alpha))

    base_dist = tfd.MultivariateNormalDiag(loc=tf.zeros([2], DTYPE))
    mlp_bijector = tfb.Chain(list(reversed(bijectors[:-1])), name='2d_mlp_bijector')
    dist = tfd.TransformedDistribution(distribution=base_dist, bijector=mlp_bijector)

    return {"dist": dist}

There are probably other such easy wins, a profiling tool will nudge you in the right direction.

Also, note that, TF 2.0 is less about "eager execution" and more about how one interacts with graphs, as per the RFC

Hope that helps.

Upvotes: 2

Related Questions