CiaranWelsh
CiaranWelsh

Reputation: 7689

Unexpected output from parallel programming in Python: am I doing it correctly?

I am trying to learn to parallel programing in Python. As a starting point I decided to write a program to test the multiprocessing module, before moving on to the 'multiprocess' module (the difference being, I understand, that the range of serializable objects is greater in the 'multiprocess' module, since it uses dill rather than pickle).

The purpose of this program is to measure the time it takes to square root the numbers in range(1000) using between 1 and 7 processes. I looped this program 80 times and generated the following graph.

enter image description here

I have a few questions about this.

  1. Have I implemented the parallelization correctly? It doesn't seem that more processes equates to less time, according to this data.
  2. Why are the standard deviations so big?

=========edit======

Question 3 is answered

  1. I ran this program two times and both times my computer (with 32Gb RAM, i7 processor and 8 cores) crashed. Why might this be?

Also, if anyone has any further tips about parallel programming in Python, they would be greatly received.

Cheers.

The code i used to generate the data:

from multiprocessing import Pool
import numpy
import time
import pandas
import os
import matplotlib.pyplot as plt
import numpy

def sqrt(x):
    return numpy.sqrt(x)

num_repeats=100
num_processors=8

if __name__ == '__main__':
    for i in range(num_repeats):
        t=[]
        print 'repeat {}'.format(i)
        for j in range(num_processors):
            if j!=0:
                pool = Pool(j)
                start=time.time()
                results = [pool.apply_async(sqrt, (x,))for x in range(1000)]
                t.append( time.time()-start)
        df=pandas.DataFrame(pandas.Series(t))
        df= df.transpose()
        df.columns=['processor {}'.format(i) for i in range(num_processors-1)]
        df.to_csv(   os.path.join( os.getcwd(),'parallel_p_test.csv')  ,mode='a',header=True)

Upvotes: 3

Views: 123

Answers (1)

scytale
scytale

Reputation: 12641

For operations that complete very quickly you will find that parallelisation does not bring much benifit since you will spend proportionally large amount of time with the overhead of process synchronisation.

I would suggest retrying with something that takes a few seconds.

If your computer crashes then there is something seriously wrong. What OS? Is there anything in the logs?

Upvotes: 4

Related Questions