Tensorflow profiling timelines inconsistent with wall time

Question

I am currently developing and profiling a neural network in tensorflow on a GTX 1070. My first implementation turned out to be rather inefficient due to some expensive FFT ops. The upper timeline in the attatched pic reveals that one training step takes about 115ms. A second (mathematically equivalent) implementation seems to consume much less time, one step takes under 10ms.

timelines (Sorry, I am not allowed to post pics yet)

Surprisingly both methods take approx. the same time when measured in python via:

build model, get train_op...
sess = tf.Session()

import timeit
t = timeit.default_timer()
for i in range(100):
    sess.run(train_op)
dt = timeit.default_timer() - t
print(dt)

In both cases 100 runs take ~17s > 100*115ms > 100*10ms which is more than I would expect from the python overhead. The network is fed using queues but the results are the same directly feeding numpy arrays into the graph so I concluded the empyt queues cannot be the bottleneck.

The timelines are in principle recorded with the difference that I measure two runs:

build model, get train_op...
sess = tf.Session()

def write_timeline(path):
    from tensorflow.python.client import timeline
    run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
    run_metadata = tf.RunMetadata()
    sess.run(train_op, options=run_options, run_metadata=run_metadata)

    # Create the Timeline object, and write it to a json
    tl = timeline.Timeline(run_metadata.step_stats)
    ctf = tl.generate_chrome_trace_format()
    with open(path, 'w') as f:
        f.write(ctf)

write_timeline('timeline_1st_run.json')
write_timeline('timeline_2nd_run.json')

The first runs always take much longer than the corresponding following runs, is tensorflow doing some optimization here or allocating memory for the first time?

timelines_first_runs

Can someone explain these inconistent results? Why does my second implementation take the same time measured in python when the profiling timeline shows that it should run much faster?

Tensorflow profiling timelines inconsistent with wall time

Answers (1)

Related Questions