Multiprocessing in Python questions

Question

So I'm new to multiprocessing and mostly just trying to figure it out. Finally got a simple little program to work. Essentially I want it to open up a CSV file and append i to a new row.

I got it working. Strangely though, it's slower to multiprocess it (by a lot) then to not use multiprocessing at all.

Can someone explain this to me?

multiprocessing.py

import csv
import multiprocessing

def wrtCSV(i):
    with open('test.csv', 'a') as newFile:
        newFileWriter = csv.writer(newFile)
        newFileWriter.writerow([str(i)])

if __name__ == '__main__':
    jobs = []
    for i in range(100000):
        p  = multiprocessing.Process(target=wrtCSV, args=(i,))
        jobs.append(p)
        p.start()

normal.py

import csv

def wrtCSV(i):
    with open('test.csv', 'a') as newFile:
        newFileWriter = csv.writer(newFile)
        newFileWriter.writerow([str(i)])

if __name__ == '__main__':
    for i in range(100000):
        wrtCSV(i)

Syed Rafay · Accepted Answer

One reason for normal.py to be faster could be that the file is not accessible by all the processes at the same time (as pointed out in the comments).

It is also important that you use different processes only when it's necessary. Example could be if you want to run deep learning classifier on 100 different images, if you do it by a normal method, it would take time. But if you divide the task between the processes, you will notice the speedup.

So I think you should only use processes only when you want them to do some heavy processing (right now you are just writing in a single file). Because each time you use multiprocessing.Process, it spawns a new process (Process Control Block is created, new process inherits resources, reserves memory for it's state, few more overheads) and spawning a process is a slow procedure.

If you really want to compare the performance of the two, maybe make separate files and do some computations inside the loop.

normal.py

import csv

def wrtCSV(i):
    for j in range(100):
        if (j**2 * j + i * (j-j) + (j*i) + 1): # lets do some redundant calculations for benchmarking
            with open('test{}{}.csv'.format(i,j), 'a') as newFile: # make individual files so that OS doesn't lock them
                newFileWriter = csv.writer(newFile)
                newFileWriter.writerow([str(i)])

if __name__ == '__main__':
    for i in range(100):
        wrtCSV(i)

multiprocessing.py

import csv
import multiprocessing

def wrtCSV(i):
    for j in range(100):
        if (j**2 * j + i * (j-j) + (j*i) + 1): # lets do some redundant calculations for benchmarking
            with open('test{}{}.csv'.format(j,i), 'a') as newFile: # make individual files so that OS doesn't lock them
                newFileWriter = csv.writer(newFile)
                newFileWriter.writerow([str(i)])

if __name__ == '__main__':
    jobs = []
    for i in range(100):
        p  = multiprocessing.Process(target=wrtCSV, args=(i,))
        jobs.append(p)
        p.start()

Check with these files, also increase the range if you want

Multiprocessing in Python questions

Answers (1)

Related Questions