jstaker7
jstaker7

Reputation: 1236

Synchronous vs asynchronous computation in Tensorflow

In the Tensorflow CIFAR tutorial it talks about using multiple GPUs and gives this warning:

"Naively employing asynchronous updates of model parameters leads to sub-optimal training performance because an individual model replica might be trained on a stale copy of the model parameters. Conversely, employing fully synchronous updates will be as slow as the slowest model replica."

What does this mean? Could someone provide a very simple example that illustrates this warning?

Upvotes: 3

Views: 1856

Answers (1)

Ian Goodfellow
Ian Goodfellow

Reputation: 2604

Suppose you have n workers.

Asynchronous means that each worker just reads parameters, computes updates, and writes updated parameters, without any locking mechanism at all. The workers can overwrite each other's work freely. Suppose worker 1 is slow for some reason. Worker 1 reads parameters at time t, and then tries to write updated parameters at time t+100. In the meantime, workers 2-n have all done a lot of updates at time step t+1, t+2, etc. When the slow worker 1 finally does its write, it overwrites all of the progress the other workers have made.

Fully synchronous means that all the workers are coordinated. Every worker reads the parameters, computes a gradient, and then waits for the other workers to finish. Then the learning algorithm computes the average of all of the gradients they computed, and does an update based on that one average. If worker 1 is very slow and takes 100 time steps to finish, but workers 2-n all finish on time step 2, then most of the workers will spend most of the time sitting doing nothing waiting for worker 1.

Upvotes: 9

Related Questions