Python : How could I execute 100 process simultaneously?

Question

What I want to achieve:

I want to run 100 processes simultaneously because it takes too much times when I execute it as a for loop. For example : With a for loop my function will wait until the first requests is sent to send the second one. But I want the 100 requests to be sent simultaneously.

Actual results:

My function doesn't execute the process simultaneously. It execute one process by one.

Expected results:

What I've tried : So I tried to run 100 process simultaneously with the multiprocessing module but it didn't go as expected. When I tried to implement the multiprocessing module to my code it has run the same way as the version without multiprocessing by running one process by one.

I might miss something.

Here is the code that I used before.

Version without multiprocessing :

import openpyxl
import requests

def getRowCount(file,sheetName):
   workbook = openpyxl.load_workbook(file)
   sheet = workbook.get_sheet_by_name(sheetname)
   return(sheet.max_row)

def readData(file,sheetName,rownum,columno):
   workbook = openpyxl.load_workbook(file)
   sheet = workbook.get_sheet_by_name(sheetName)
   return sheet.cell(row=rownum, column=columno).value

path = my_excel_file
sheetname = my_sheetname
rows = getRowCount(path,sheetname)

def funct():
   for i in range(2, 100):
      name = readData(path,sheetname,i,1)
      data = "name":{}.format(name)
      s = requests.Session()
      session = s.post(url,headers=s.headers,data=data)

if __name__ == "__main__":
    funct()

Here is the code that I'm trying to use to solve my issue.

Version with multiprocessing :

import openpyxl
import requests
from multiprocessing import Process,Lock

def getRowCount(file,sheetName):
   workbook = openpyxl.load_workbook(file)
   sheet = workbook.get_sheet_by_name(sheetname)
   return(sheet.max_row)

def readData(file,sheetName,rownum,columno):
   workbook = openpyxl.load_workbook(file)
   sheet = workbook.get_sheet_by_name(sheetName)
   return sheet.cell(row=rownum, column=columno).value

path = my_excel_file
sheetname = my_sheetname
rows = getRowCount(path,sheetname)

def thread.task(lock,i):
    lock.acquire()
    name = readData(path,sheetname,i,1)
    data = "name":{}.format(name)
    s = requests.Session()
    session = s.post(url,headers=s.headers,data=data)
    lock.release()

if __name__ == '__main__':
    lock = Lock()
    processes = [Process(target=thread_task, args=(lock, i)) for i in range(2,100)]
    for process in processes:
        process.start()
    for process in processes:
        process.join()

How could I do to execute all my process simultaneously?

If there is a better solution than multiprocessing to achieve what I want please let me know.

UPDATE

I can now run my process simultaneously.

UPDATE Version with multiprocessing :

import openpyxl
import requests
from multiprocessing import Process,Lock

def getRowCount(file,sheetName):
   workbook = openpyxl.load_workbook(file)
   sheet = workbook.get_sheet_by_name(sheetname)
   return(sheet.max_row)

def readData(file,sheetName,rownum,columno):
   workbook = openpyxl.load_workbook(file)
   sheet = workbook.get_sheet_by_name(sheetName)
   return sheet.cell(row=rownum, column=columno).value

path = my_excel_file
sheetname = my_sheetname
rows = getRowCount(path,sheetname)

def thread.task(i):
    name = readData(path,sheetname,i,1)
    data = "name":{}.format(name)
    s = requests.Session()
    session = s.post(url,headers=s.headers,data=data)

if __name__ == '__main__':
    processes = [Process(target=thread_task, args=(i,)) for i in range(2,100)]
    for process in processes:
        process.start()
    for process in processes:
        process.join()

I deleted the lock off my code and now it works. It seemed that the lock was blocking other process to run simultaneously.

Cameron Cairns · Accepted Answer

You may want to try going up a layer and do the concurrent requests via parameters fed into your script in a for loop using bash. From what I am reading in your code, there doesn't appear to be any dependency between the processes you want to run in parallel (the requests you are making) so given that it's safe to run these as 100 separate calls to the same python script and let the operating system worry about the concurrency.

You should be able to achieve this by adding parameters in a bash for loop like so:

for i in {2..100}
do
    run_script.py $i
done

Then in your python script you'd grab this parameter using the sys module


if __name__ == "__main__":
    i = sys.argv[1] # 0 is the name of the script being ran (e.g. run_script.py)
    funct(i)

then rewrite your funct function to handle the parameter. I hope this helps!

P.S. If you need to do more complicated process coordination the invoke library can help a lot: http://www.pyinvoke.org/

Python : How could I execute 100 process simultaneously?

Answers (2)

Related Questions