Reputation: 11
I am currently scrapping data from various sites. The code for the scrappers is stored in modules (x,y,z,a,b)
Where x.dump is a function which uses Files for storing the scraped data. The dump function takes a single argument 'input'. Note : All the dump functions are not same.
I am trying to run each of these dump function in parallel. The following code runs fine. But i have noticed that it still follows serial order x then y ... for execution.
Is this the correct way of going about the problem?
Are multithreading and multiprocessing the only native ways for parallel programming?
from multiprocessing import Process
import x.x as x
import y.y as y
import z.z as z
import a.a as a
import b.b as b
input = ""
f_list = [x.dump, y.dump, z.dump, a.dump, b.dump]
processes = []
for function in f_list:
processes.append(Process(target=function, args=(input,)))
for process in processes:
process.run()
for process in processes:
process.join()
Upvotes: 1
Views: 123
Reputation: 32720
You should be calling process.start() not process.run()
The start method does the work of starting the extra process and then running the run method in that process.
Upvotes: 0
Reputation: 249652
That's because run()
is the method to implement the task itself, you're not meant to call it from outside like that. You are supposed to call start()
which spawns a new process which then calls run()
in the other process and returns control to you so you can do more work (and later join()
).
Upvotes: 3