Reputation: 493
I am writing a calibration pipeline to learn the hyperparameters for neural networks to detect properties of DNA sequences*. This therefore requires training a large number of models on the same dataset with different hyperparameters.
I am trying to optimise this to run on GPU. DNA sequence datasets are quite small compared to image datasets (typically 10s or 100s of base-pairs in 4 'channels' to represent the 4 DNA bases, A, C, G and T, compared to 10,000s of pixels in 3 RGB channels), and consequently cannot make full use of the parallelisation on a GPU unless multiple models are trained at the same time.
Is there a way to do this in Caffe in Python?
(I previously asked this question with reference to doing this in nolearn, lasagne or Theano, but I'm not sure it's possible so have moved on to Caffe.)
* It's based on the DeepBind model for detecting where transcription factors bind to DNA, if you're interested.
Upvotes: 4
Views: 1262
Reputation: 1459
Yes. The following options are available for this.
Option 1: Parallel training processes. Write a network definition for each hyper-parameter configuration and run a training process for each.
Pros: No need to manage resources yourself. Cons: If all networks are working off of the same data, you'll be paying the cost of data copy from CPU to GPU RAM multiple times. This can get highly redundant. I've noticed this slowing down so much that I might as well have trained the different network sequentially.
Option 2: Merge different network definitions into a single prototxt and train with a single solver. The layer and blob names have to be unique for the sub-networks to be independent.
This ipython notebook provides a demo of this. The merging of networks into the prototxt for you. The notebook also demonstrates that by keeping the layer and blob names unique fo each sub network, the combined training process will reach the same results as if you trained each independently.
Upvotes: 2
Reputation: 530
Maybe I haven't understood the question but to train multiple networks on whatever framework you just have to run a different process for each training (if you have more than one GPU, set the flags in order to that each prices use a different one).
With one GPU, in image classification the main problem we usually have is the lack of memory (a part of the obvious ones) but presuming that you won't have that, you should be able run both trainings over the same GPU. Because the context switching between processes in the GPU, you're going to get a slow-down in the training performance. From my experience, this slow-down was by a factor of x2-x4 so usually is not worth it (but check anyway).
Upvotes: 0