Reputation: 23
What I want to achieve:
I want to run 100 processes simultaneously because it takes too much times when I execute it as a for loop. For example : With a for loop my function will wait until the first requests is sent to send the second one. But I want the 100 requests to be sent simultaneously.
Actual results:
My function doesn't execute the process simultaneously. It execute one process by one.
Expected results:
What I've tried : So I tried to run 100 process simultaneously with the multiprocessing module but it didn't go as expected. When I tried to implement the multiprocessing module to my code it has run the same way as the version without multiprocessing by running one process by one.
I might miss something.
Here is the code that I used before.
Version without multiprocessing :
import openpyxl
import requests
def getRowCount(file,sheetName):
workbook = openpyxl.load_workbook(file)
sheet = workbook.get_sheet_by_name(sheetname)
return(sheet.max_row)
def readData(file,sheetName,rownum,columno):
workbook = openpyxl.load_workbook(file)
sheet = workbook.get_sheet_by_name(sheetName)
return sheet.cell(row=rownum, column=columno).value
path = my_excel_file
sheetname = my_sheetname
rows = getRowCount(path,sheetname)
def funct():
for i in range(2, 100):
name = readData(path,sheetname,i,1)
data = "name":{}.format(name)
s = requests.Session()
session = s.post(url,headers=s.headers,data=data)
if __name__ == "__main__":
funct()
Here is the code that I'm trying to use to solve my issue.
Version with multiprocessing :
import openpyxl
import requests
from multiprocessing import Process,Lock
def getRowCount(file,sheetName):
workbook = openpyxl.load_workbook(file)
sheet = workbook.get_sheet_by_name(sheetname)
return(sheet.max_row)
def readData(file,sheetName,rownum,columno):
workbook = openpyxl.load_workbook(file)
sheet = workbook.get_sheet_by_name(sheetName)
return sheet.cell(row=rownum, column=columno).value
path = my_excel_file
sheetname = my_sheetname
rows = getRowCount(path,sheetname)
def thread.task(lock,i):
lock.acquire()
name = readData(path,sheetname,i,1)
data = "name":{}.format(name)
s = requests.Session()
session = s.post(url,headers=s.headers,data=data)
lock.release()
if __name__ == '__main__':
lock = Lock()
processes = [Process(target=thread_task, args=(lock, i)) for i in range(2,100)]
for process in processes:
process.start()
for process in processes:
process.join()
How could I do to execute all my process simultaneously?
If there is a better solution than multiprocessing to achieve what I want please let me know.
UPDATE
I can now run my process simultaneously.
UPDATE Version with multiprocessing :
import openpyxl
import requests
from multiprocessing import Process,Lock
def getRowCount(file,sheetName):
workbook = openpyxl.load_workbook(file)
sheet = workbook.get_sheet_by_name(sheetname)
return(sheet.max_row)
def readData(file,sheetName,rownum,columno):
workbook = openpyxl.load_workbook(file)
sheet = workbook.get_sheet_by_name(sheetName)
return sheet.cell(row=rownum, column=columno).value
path = my_excel_file
sheetname = my_sheetname
rows = getRowCount(path,sheetname)
def thread.task(i):
name = readData(path,sheetname,i,1)
data = "name":{}.format(name)
s = requests.Session()
session = s.post(url,headers=s.headers,data=data)
if __name__ == '__main__':
processes = [Process(target=thread_task, args=(i,)) for i in range(2,100)]
for process in processes:
process.start()
for process in processes:
process.join()
I deleted the lock off my code and now it works. It seemed that the lock was blocking other process to run simultaneously.
Upvotes: 1
Views: 1482
Reputation: 33685
A lot of people do not care about learning multiprocessing in Python. They just want to get things done.
Some of them use GNU Parallel to do the multiprocessing for them. This way they just have to make a single threaded program that takes an argument.
In your case you should be able to make this change:
def funct(i):
name = readData(path,sheetname,i,1)
data = "name":{}.format(name)
s = requests.Session()
session = s.post(url,headers=s.headers,data=data)
if __name__ == "__main__":if __name__ == "__main__":
funct(sys.argv[1])
Then you can run:
seq 200 | parallel --jobs 100 python myscript.py {}
Upvotes: 0
Reputation: 173
You may want to try going up a layer and do the concurrent requests via parameters fed into your script in a for loop using bash. From what I am reading in your code, there doesn't appear to be any dependency between the processes you want to run in parallel (the requests you are making) so given that it's safe to run these as 100 separate calls to the same python script and let the operating system worry about the concurrency.
You should be able to achieve this by adding parameters in a bash for loop like so:
for i in {2..100}
do
run_script.py $i
done
Then in your python script you'd grab this parameter using the sys module
if __name__ == "__main__":
i = sys.argv[1] # 0 is the name of the script being ran (e.g. run_script.py)
funct(i)
then rewrite your funct function to handle the parameter. I hope this helps!
P.S. If you need to do more complicated process coordination the invoke library can help a lot: http://www.pyinvoke.org/
Upvotes: 1