Srivatsan
Srivatsan

Reputation: 9363

Multiprocessing in python

I am writing a Python script (in Python 2.7) wherein I need to generate around 500,000 uniform random numbers within a range. I need to do this 4 times, perform some calculations on them and write out the 4 files.

At the moment I am doing: (this is just part of my for loop, not the entire code)

random_RA = []
for i in xrange(500000):
    random_RA.append(np.random.uniform(6.061,6.505)) # FINAL RANDOM RA 

random_dec = []
for i in xrange(500000):
    random_dec.append(np.random.uniform(min(data_dec_1),max(data_dec_1))) # FINAL RANDOM 'dec'

to generate the random numbers within the range. I am running Ubuntu 14.04 and when I run the program I also open my system manager to see how the 8 CPU's I have are working. I seem to notice that when the program is running, only 1 of the 8 CPU's seem to work at 100% efficiency. So the entire program takes me around 45 minutes to complete.

I noticed that it is possible to use all the CPU's to my advantage using the module Multiprocessing

I would like to know if this is enough in my example:

random_RA = []
for i in xrange(500000): 
    multiprocessing.Process()
    random_RA.append(np.random.uniform(6.061,6.505)) # FINAL RANDOM RA

i.e adding just the line multiprocessing.Process(), would that be enough?

Upvotes: 0

Views: 162

Answers (2)

Vidhya G
Vidhya G

Reputation: 2320

To get you started:

import multiprocessing
import random

def worker(i):
    random.uniform(1,100000)
    print i,'done'


if __name__ == "__main__":
    for i in range(4):
        t = multiprocessing.Process(target = worker, args=(i,))
        t.start()
    print 'All the processes have been started.'

You must gate the t = multiprocess.Process(...) with __name__ == "__main__" as each worker calls this program (module) again to find out what it has to do. If the gating didn't happen it would spawn more processes ...

Just for completeness, generating 500000 random numbers is not going to take you 45 minutes so i assume there are some intensive calculations going on here: you may want to look at them closely.

Upvotes: 0

jbaiter
jbaiter

Reputation: 7099

If you use multiprocessing, you should avoid shared state (like your random_RA list) as much as possible. Instead, try to use a Pool and its map method:

from multiprocessing import Pool, cpu_count

def generate_random_ra(x):
    return np.random.uniform(6.061, 6.505)

def generate_random_dec(x):
    return np.random.uniform(min(data_dec_1), max(data_dec_1))

pool = Pool(cpu_count())
random_RA = pool.map(generate_random_ra, xrange(500000))
random_dec = pool.map(generate_random_dec, xrange(500000))

Upvotes: 1

Related Questions