spalxyz
spalxyz

Reputation: 41

Python n_jobs for large scale prob

When I use the parameter n_jobs in cross_val_score().

It works well when my data scale is small, but as I enlarged the scale of my data, the multiprocessing seems don't work.

How could this be?

P.S.: I use IPython

Upvotes: 2

Views: 3248

Answers (1)

user3666197
user3666197

Reputation: 1

n_jobs does the job and helps

Given your "small scale" multiprocessing experience was satisfactory, there is no reason for n_jobs to cease accelerating the actual processing on larger scales.

Machine-learning processing pipelines exhibit in many cases O( N f ) polynomial complexity, where N is the number of examples in the dataset of observations ( X ) and the f is cited in various papers to be for various learning algoritms not less than ~ ( 2+ ).

As the dataset sizes grow, the time-to-compute expands polynomially. Whereas even the n_jobs help can ( in most optimistic cases ) reduce that growth by not more than just by a constant factor of 1 / { 2, 3, 4, .., 8, .., <max_cpu_cores> }
( and yet this constant factor is further remarkably reduced due to Amdahl's Law ( ref. Fig. below ), depending on how much of the work is possible to run in parallel on such n_jobs v/s how much work yet remains just serially executed, irrespective of how many jobs one has set to launch and run. Tweaking O/S process priorities does some help, but large N-s still grow a lot, so the more The Time is Money applies here ).

Some additionally adverse effects come from a need to allocate replicated data-structures, so [PSPACE] adds to the already painful [PTIME] headaches.

Even the most promising ML-learners grow in [TIME] dimension of the complexity by O( N log( N ) )


So, n_jobs always helps a lot,

you would have to wait way longer without using n_jobs, just try to run the same problem with n_jobs = 1 and with n_jobs = -1.

Anyway, stay tuned. AI / ML-domain is full of [ *TIME, *SPACE ], where * is not only polynomial, as you have experienced so far, but many times the exponential, so get ready to fight Dragons :o)


Epilogue:

The good news is, that for advanced AI/ML-processing, the contemporary technology can provide infrastructures, that have reasonable amounts of TB RAM for fighting the [PSPACE]/[EXPSPACE] battle-front and at the same time efficiently harness thousands real CPU-s, so one can make n_jobs -- if research needs plus financing allows -- up to some large thousands of jobs!

Yes, large thousands -- so one can receive scikit results remarkably faster, than without harnessing such infrastructure ( and a Cloud is not the answer here ... ( do not hesistate to drop me a note if you need such high-performance infrastructure up and running for your large-scale problems, ok? ) )

( all that if Amdahl Law-view of the process-graph possible Speedup does make sense to scale ) enter image description here


Pictures are nice,
but see the absolute figures to "SMELL THE SMOKE"
[hr] v/s [min]:

>>> procPAR_SEQ_percent = 1.00
                         #100 % PAR:

                       n_jobs:    Speedup x|       [hr] =          [s] =        [min]
-----------------------------:-------------|------------=--------------=-------------
                            1:   1.000000 x| 3.000 [hr] = 10800.00 [s] = 180.00 [min]
                            2:   1.999985 x| 1.500 [hr] =  5400.04 [s] =  90.00 [min]
                            4:   3.999941 x| 0.750 [hr] =  2700.04 [s] =  45.00 [min]
                            8:   7.999763 x| 0.375 [hr] =  1350.04 [s] =  22.50 [min]
                           16:  15.999052 x| 0.188 [hr] =   675.04 [s] =  11.25 [min]
                           32:  31.996208 x| 0.094 [hr] =   337.54 [s] =   5.63 [min]
                           64:  63.984833 x| 0.047 [hr] =   168.79 [s] =   2.81 [min]
                          128: 127.939347 x| 0.023 [hr] =    84.42 [s] =   1.41 [min]
                          256: 255.757504 x| 0.012 [hr] =    42.23 [s] =   0.70 [min]
                          512: 511.030934 x| 0.006 [hr] =    21.13 [s] =   0.35 [min]
                         1024: 996.309963 x| 0.003 [hr] =    10.84 [s] =   0.18 [min]
                         2048: 996.309963 x| 0.003 [hr] =    10.84 [s] =   0.18 [min]
                         4096: 996.309963 x| 0.003 [hr] =    10.84 [s] =   0.18 [min]
                         8192: 996.309963 x| 0.003 [hr] =    10.84 [s] =   0.18 [min]

>>> procPAR_SEQ_percent = 0.99
                          # 99 % PAR:


                       n_jobs:   Speedup x|       [hr] =          [s] =        [min]
-----------------------------:------------|------------=--------------=-------------
                            1:  1.000000 x| 3.000 [hr] = 10800.00 [s] = 180.00 [min]
                            2:  1.980183 x| 1.530 [hr] =  5507.50 [s] =  91.79 [min]
                            4:  3.883439 x| 0.795 [hr] =  2861.23 [s] =  47.69 [min]
                            8:  7.476428 x| 0.427 [hr] =  1538.09 [s] =  25.63 [min]
                           16: 13.912327 x| 0.243 [hr] =   876.53 [s] =  14.61 [min]
                           32: 24.425271 x| 0.152 [hr] =   545.74 [s] =   9.10 [min]
                           64: 39.258095 x| 0.106 [hr] =   380.35 [s] =   6.34 [min]
                          128: 56.375891 x| 0.083 [hr] =   297.66 [s] =   4.96 [min]
                          256: 72.093421 x| 0.071 [hr] =   256.31 [s] =   4.27 [min]
                          512: 83.771055 x| 0.065 [hr] =   235.63 [s] =   3.93 [min]
                         1024: 90.961156 x| 0.063 [hr] =   225.54 [s] =   3.76 [min]
                         2048: 90.961156 x| 0.063 [hr] =   225.54 [s] =   3.76 [min]
                         4096: 90.961156 x| 0.063 [hr] =   225.54 [s] =   3.76 [min]
                         8192: 90.961156 x| 0.063 [hr] =   225.54 [s] =   3.76 [min]

 >>> procPAR_SEQ_percent = 0.98
                           # 98 % PAR:


                       n_jobs:   Speedup x|       [hr] =          [s] =        [min]
-----------------------------:------------|------------=--------------=-------------
                            1:  1.000000 x| 3.000 [hr] = 10800.00 [s] = 180.00 [min]
                            2:  1.960770 x| 1.559 [hr] =  5613.88 [s] =  93.56 [min]
                            4:  3.773532 x| 0.839 [hr] =  3020.80 [s] =  50.35 [min]
                            8:  7.017361 x| 0.479 [hr] =  1724.26 [s] =  28.74 [min]
                           16: 12.307131 x| 0.299 [hr] =  1075.99 [s] =  17.93 [min]
                           32: 19.751641 x| 0.209 [hr] =   751.85 [s] =  12.53 [min]
                           64: 28.315614 x| 0.164 [hr] =   589.79 [s] =   9.83 [min]
                          128: 36.153350 x| 0.141 [hr] =   508.75 [s] =   8.48 [min]
                          256: 41.960691 x| 0.130 [hr] =   468.24 [s] =   7.80 [min]
                          512: 45.625087 x| 0.124 [hr] =   447.98 [s] =   7.47 [min]
                         1024: 47.656029 x| 0.122 [hr] =   438.09 [s] =   7.30 [min]
                         2048: 47.656029 x| 0.122 [hr] =   438.09 [s] =   7.30 [min]
                         4096: 47.656029 x| 0.122 [hr] =   438.09 [s] =   7.30 [min]
                         8192: 47.656029 x| 0.122 [hr] =   438.09 [s] =   7.30 [min]

Last but not least, I run all AI/ML-engines from standard python, never from iPython in production. With n_jobs = -1 and other acceleration tricks, still the global model-search and hyper-parameters optimisation pipelines span many days to robustly ge t to a best generalising model's global-minimum.

Upvotes: 4

Related Questions