Ohumeronen
Ohumeronen

Reputation: 2086

Scikit-Learn GridSearchCV: Avoid function to copy data for each process in parallel

I use sklearn.grid_search.GridSearchCV in parallel with several cpus/cores. Calling the fit method creates several copies (one for each process) of my data. That causes my processes to crash due to memory limitations.

Is there a way to prevent the function from copying the data for each process? Can I use shared memory for all cores?

Upvotes: 2

Views: 783

Answers (1)

Alvaro Ulloa
Alvaro Ulloa

Reputation: 75

python by default creates a new process for each parallel task. This new process copies the data. I would recommend using the multiprocess shared environment to avoid this. You can see an example in https://github.com/alvarouc/polyssifier/blob/master/polyssifier/polyssifier.py#L87

Upvotes: 1

Related Questions