ZK Zhao
ZK Zhao

Reputation: 21523

How can I speed up my program created in Jupyter Notebook?

I have a python program which is created in a Jupter Notebook. Due to the datasize and the optimization algo I used, a 4-fold custom cross validation within some range takes about 30 minutes to finish.

My computer's environment: CPU i5 3.3 GHz, 8 GB DDR3 RAM, SSD.

I'm wondering

  1. If it is possible to deploy it to some server and may make the speed a little bit quicker? (The data file is only about 30MB, I think it is possible to both upload the data and the program). And this may also help others who want to use the program.

  2. Can I do anything to speed up the cross validation? It's kind a manual process. I use sklearn.cross_validation.KFold to extract the train and test set. Then I loop through each fold to build the model and test its result. I'm not sure if it possible to encapsulate my model building method and perform the cross validation in parrallel?

Upvotes: 1

Views: 11020

Answers (1)

Gábor Erdős
Gábor Erdős

Reputation: 3689

1: There are a couple paid HPC servers such as Amazon, but this is off topic for SO.

2: The iteration of the cross validation can be done in parallel.

As the cross validations are not connected, i would suggest something like this:

import multiprocessing

def validation_function(args):
    do_validation
    ...
    ...

p = multiprocessing.Pool(processes=multiprocessing.cpu_count()) 
for _ in p.imap_unordered(validation_function, args):
    pass

Upvotes: 1

Related Questions