Tensorflow multi-threading data race

Question

What is best practice in Tensorflow to avoid inconsistent weights when concurrently reading and updating the weights?

Currently I am doing a simple Q-Learning on a board game. The typical steps are: using the NN to choose the best step, use reward + chosen step's Q-value to update current value.

Since it is happening sequentially, my GPU usage is really low (around 10%). To speed up, I plan to run multiple agents, and use a Queue to store the data points, processing them in batches to update the weights.

Tensorflow provides SyncReplicasOptimizer, but based on the documentaion, it has a barrier that waits for all workers to finish at each step, updates the weights, then resumes all workers. This will still result in low GPU utilization when all other threads are waiting for longest worker.

I want to achieve higher speed by removing the barrier. This means, workers are reading the weights of NN to compute scores while the trainer thread is updating the weights.

What's the best practice to avoid the data race but still achieve full GPU utilization?

Tensorflow multi-threading data race

Answers (1)

Related Questions