Python: Parallelizing GPU and CPU work

Question

Processing batches for my ML models takes too much time so I am considering parallelizing them.

Right now preprocessing routine grabs data from SSD, does preprocessing and form a data structure for learning. All this time ML training process waits. Then ML process takes this data and uses it to train the model. Now preprocessing waits. And then they go round. This waiting time sums up quickly and delays model training.

The plan is the following: a single data structure would store a bunch of data points. Each step the training algorithm would take random subset of them to train the model (SGD with TensorFlow on GPU).

In parallel with that I would like another thread to do preprocessing of the next bunch of data points. And when preprocessing is ready it would replace old data structure object with the new one. And so forth.

As this is my first approach to parallelization in Python I wonder if this would work at all. Would global interpreter lock prevent the system from doing these tasks in a truly parallel way?

keveman · Accepted Answer

TensorFlow's Python binding is very diligent about releasing the global interpreter lock as soon as possible. For instance, it does not hold the lock when the control has transferred to the C++ library in the tf.Session's run method. What you describe is a very common pattern in TensorFlow. Input data preprocessing and training an ML model using the preprocessed data are decoupled in TensorFlow using queues. There is an illustrative example of how input preprocessing and training are parallelized in the Inception model.

Python: Parallelizing GPU and CPU work

Answers (1)

Related Questions