Reputation: 2086
I use sklearn.grid_search.GridSearchCV in parallel with several cpus/cores. Calling the fit method creates several copies (one for each process) of my data. That causes my processes to crash due to memory limitations.
Is there a way to prevent the function from copying the data for each process? Can I use shared memory for all cores?
Upvotes: 2
Views: 783
Reputation: 75
python by default creates a new process for each parallel task. This new process copies the data. I would recommend using the multiprocess shared environment to avoid this. You can see an example in https://github.com/alvarouc/polyssifier/blob/master/polyssifier/polyssifier.py#L87
Upvotes: 1