Reputation: 8807
What is best practice in Tensorflow to avoid inconsistent weights when concurrently reading and updating the weights?
Currently I am doing a simple Q-Learning on a board game. The typical steps are: using the NN to choose the best step, use reward + chosen step's Q-value to update current value.
Since it is happening sequentially, my GPU usage is really low (around 10%). To speed up, I plan to run multiple agents, and use a Queue to store the data points, processing them in batches to update the weights.
Tensorflow provides SyncReplicasOptimizer, but based on the documentaion, it has a barrier that waits for all workers to finish at each step, updates the weights, then resumes all workers. This will still result in low GPU utilization when all other threads are waiting for longest worker.
I want to achieve higher speed by removing the barrier. This means, workers are reading the weights of NN to compute scores while the trainer thread is updating the weights.
What's the best practice to avoid the data race but still achieve full GPU utilization?
Upvotes: 3
Views: 119
Reputation: 770
You could use two separate networks.
One built on the GPU where the backprop takes place. The other is built on the CPU and the ops are shared between all the agent thread so they can use it to get the scores and take the optimal action.
After every k
iterations, you can assign the weights of the GPU network to CPU network using a tf.assign
.
This allows for higher GPU utilisation, and better convergence, as the CPU network acts as a target network that's seldom updated, and therefore leads to less variance in the loss.
Upvotes: 1