Reputation: 3226
I've see some benchmark about tensorflow
and pytorch
. Tensorflow
maybe faster but seems not that faster even sometimes slower.
Is there any benchmark on specifically testing on static graph and dynamic graph demonstrating that static graph is much faster that dynamic graph?
Upvotes: 1
Views: 1658
Reputation: 12891
Static Graphs are allowing a few types of optimizations, which depend on the type of graph and the environment that you are running in.
The simple example of static graph optimizations is the option to reuse an existing variable memory, therefore saving on the expensive allocation of the memory (see here for more details using MXNet: https://mxnet.incubator.apache.org/architecture/note_memory.html ).
Another simple example is the ability to combine multiple operators into a single one, especially in GPU or other specific hardware optimization environments that allow "compilation" of the code to use the HW acceleration options. In this context, thinking about paying a bit more in "compilation" time and getting the speed up in execution time, is usually a no-brainer in deep learning training. When you are running training on a lot of data across many epocha, this tradeoff is meaningless as the execution time is much longer (hours and days) than the additional compilation time (seconds or minutes).
But the most powerful ones are when you allow parallel and distributed processing. If you have multiple GPU or instances to speed up your training, being able to achieve linear scale is critical to allow you to scale the models for more data to train on, more layers and parameters to use, and more epochs to pass over. Having a static graph allows the deep learning framework to optimize the execution of it across your resources more efficiently. See here for MXNet support for multiple GPU instances: http://gluon.mxnet.io/chapter07_distributed-learning/training-with-multiple-machines.html.
Upvotes: 1
Reputation: 57883
To be more precise, the speed benefit comes from "deferred execution with graph rewriting."
It's typically associated with explicit graph frameworks (Theano/TF), but with enough engineering you could add it to execution models like numpy/PyTorch which don't have explicit graph. See Bohrium for an example of hacking numpy to do rewriting.
Note that presence of this feature makes the framework less friendly for prototyping, so if you add this to PyTorch, you'll get the same problems that people complain about in TensorFlow
As far as performance, here's a toy benchmark in TensorFlow showing 5x speed-up when you turn on graph rewriting.
I crafted the example to be bottlenecked by memory bandwidth, so it's a no-brainer that graph rewriting (cwise fusion), will give significant speed-boost there. For production LSTM model Google reported 1.8 speed-up when turning on graph optimizations (through XLA)
Upvotes: 6