Reputation: 41
When I use the parameter n_jobs
in cross_val_score()
.
It works well when my data scale is small, but as I enlarged the scale of my data, the multiprocessing
seems don't work.
How could this be?
P.S.: I use IPython
Upvotes: 2
Views: 3248
Reputation: 1
n_jobs
does the job and helpsGiven your "small scale" multiprocessing
experience was satisfactory, there is no reason for n_jobs
to cease accelerating the actual processing on larger scales.
Machine-learning processing pipelines exhibit in many cases O( N f ) polynomial complexity, where N is the number of examples in the dataset of observations ( X ) and the f is cited in various papers to be for various learning algoritms not less than ~ ( 2+ ).
As the dataset sizes grow, the time-to-compute expands polynomially.
Whereas even the n_jobs
help can ( in most optimistic cases ) reduce that growth by not more than just by a constant factor of 1 / { 2, 3, 4, .., 8, .., <max_cpu_cores> }
( and yet this constant factor is further remarkably reduced due to Amdahl's Law ( ref. Fig. below ), depending on how much of the work is possible to run in parallel on such n_jobs
v/s how much work yet remains just serially executed, irrespective of how many jobs one has set to launch and run. Tweaking O/S process priorities does some help, but large N
-s still grow a lot, so the more The Time is Money applies here ).
Some additionally adverse effects come from a need to allocate replicated data-structures, so [PSPACE] adds to the already painful [PTIME] headaches.
Even the most promising ML-learners grow in [TIME] dimension of the complexity by O( N log( N ) )
n_jobs
always helps a lot,you would have to wait way longer without using n_jobs
, just try to run the same problem with n_jobs = 1
and with n_jobs = -1
.
Anyway, stay tuned. AI / ML-domain is full of [ *TIME, *SPACE ], where *
is not only polynomial, as you have experienced so far, but many times the exponential, so get ready to fight Dragons :o)
The good news is, that for advanced AI/ML-processing, the contemporary technology can provide infrastructures, that have reasonable amounts of TB RAM for fighting the [PSPACE]/[EXPSPACE] battle-front and at the same time efficiently harness thousands real CPU-s, so one can make n_jobs
-- if research needs plus financing allows -- up to some large thousands of jobs!
Yes, large thousands -- so one can receive scikit
results remarkably faster, than without harnessing such infrastructure ( and a Cloud is not the answer here ... ( do not hesistate to drop me a note if you need such high-performance infrastructure up and running for your large-scale problems, ok? ) )
( all that if Amdahl Law-view of the process-graph possible Speedup does make sense to scale )
>>> procPAR_SEQ_percent = 1.00
#100 % PAR:
n_jobs: Speedup x| [hr] = [s] = [min]
-----------------------------:-------------|------------=--------------=-------------
1: 1.000000 x| 3.000 [hr] = 10800.00 [s] = 180.00 [min]
2: 1.999985 x| 1.500 [hr] = 5400.04 [s] = 90.00 [min]
4: 3.999941 x| 0.750 [hr] = 2700.04 [s] = 45.00 [min]
8: 7.999763 x| 0.375 [hr] = 1350.04 [s] = 22.50 [min]
16: 15.999052 x| 0.188 [hr] = 675.04 [s] = 11.25 [min]
32: 31.996208 x| 0.094 [hr] = 337.54 [s] = 5.63 [min]
64: 63.984833 x| 0.047 [hr] = 168.79 [s] = 2.81 [min]
128: 127.939347 x| 0.023 [hr] = 84.42 [s] = 1.41 [min]
256: 255.757504 x| 0.012 [hr] = 42.23 [s] = 0.70 [min]
512: 511.030934 x| 0.006 [hr] = 21.13 [s] = 0.35 [min]
1024: 996.309963 x| 0.003 [hr] = 10.84 [s] = 0.18 [min]
2048: 996.309963 x| 0.003 [hr] = 10.84 [s] = 0.18 [min]
4096: 996.309963 x| 0.003 [hr] = 10.84 [s] = 0.18 [min]
8192: 996.309963 x| 0.003 [hr] = 10.84 [s] = 0.18 [min]
>>> procPAR_SEQ_percent = 0.99
# 99 % PAR:
n_jobs: Speedup x| [hr] = [s] = [min]
-----------------------------:------------|------------=--------------=-------------
1: 1.000000 x| 3.000 [hr] = 10800.00 [s] = 180.00 [min]
2: 1.980183 x| 1.530 [hr] = 5507.50 [s] = 91.79 [min]
4: 3.883439 x| 0.795 [hr] = 2861.23 [s] = 47.69 [min]
8: 7.476428 x| 0.427 [hr] = 1538.09 [s] = 25.63 [min]
16: 13.912327 x| 0.243 [hr] = 876.53 [s] = 14.61 [min]
32: 24.425271 x| 0.152 [hr] = 545.74 [s] = 9.10 [min]
64: 39.258095 x| 0.106 [hr] = 380.35 [s] = 6.34 [min]
128: 56.375891 x| 0.083 [hr] = 297.66 [s] = 4.96 [min]
256: 72.093421 x| 0.071 [hr] = 256.31 [s] = 4.27 [min]
512: 83.771055 x| 0.065 [hr] = 235.63 [s] = 3.93 [min]
1024: 90.961156 x| 0.063 [hr] = 225.54 [s] = 3.76 [min]
2048: 90.961156 x| 0.063 [hr] = 225.54 [s] = 3.76 [min]
4096: 90.961156 x| 0.063 [hr] = 225.54 [s] = 3.76 [min]
8192: 90.961156 x| 0.063 [hr] = 225.54 [s] = 3.76 [min]
>>> procPAR_SEQ_percent = 0.98
# 98 % PAR:
n_jobs: Speedup x| [hr] = [s] = [min]
-----------------------------:------------|------------=--------------=-------------
1: 1.000000 x| 3.000 [hr] = 10800.00 [s] = 180.00 [min]
2: 1.960770 x| 1.559 [hr] = 5613.88 [s] = 93.56 [min]
4: 3.773532 x| 0.839 [hr] = 3020.80 [s] = 50.35 [min]
8: 7.017361 x| 0.479 [hr] = 1724.26 [s] = 28.74 [min]
16: 12.307131 x| 0.299 [hr] = 1075.99 [s] = 17.93 [min]
32: 19.751641 x| 0.209 [hr] = 751.85 [s] = 12.53 [min]
64: 28.315614 x| 0.164 [hr] = 589.79 [s] = 9.83 [min]
128: 36.153350 x| 0.141 [hr] = 508.75 [s] = 8.48 [min]
256: 41.960691 x| 0.130 [hr] = 468.24 [s] = 7.80 [min]
512: 45.625087 x| 0.124 [hr] = 447.98 [s] = 7.47 [min]
1024: 47.656029 x| 0.122 [hr] = 438.09 [s] = 7.30 [min]
2048: 47.656029 x| 0.122 [hr] = 438.09 [s] = 7.30 [min]
4096: 47.656029 x| 0.122 [hr] = 438.09 [s] = 7.30 [min]
8192: 47.656029 x| 0.122 [hr] = 438.09 [s] = 7.30 [min]
Last but not least, I run all AI/ML-engines from standard python
, never from iPython in production. With n_jobs = -1
and other acceleration tricks, still the global model-search and hyper-parameters optimisation pipelines span many days to robustly ge t to a best generalising model's global-minimum.
Upvotes: 4