Reputation: 3236
I am new to python and try to execute two tasks simultanousely. These tasks are just fetching pages on a web server and one can terminate before the other. I want to display the result only when all requests are served. Easy in linux shell but I get nowhere with python and all the howto's I read look like black magic to a beginner like me. They all look over complicated to me compared with the simplicity of the bash script below.
Here is the bash script I would like to emulate in python:
# First request (in background). Result stored in file /tmp/p1
wget -q -O /tmp/p1 "http://ursule/test/test.php?p=1&w=5" &
PID_1=$!
# Second request. Result stored in file /tmp/p2
wget -q -O /tmp/p2 "http://ursule/test/test.php?p=2&w=2"
PID_2=$!
# Wait for the two processes to terminate before displaying the result
wait $PID_1 && wait $PID_2 && cat /tmp/p1 /tmp/p2
The test.php script is a simple:
<?php
printf('Process %s (sleep %s) started at %s ', $_GET['p'], $_GET['w'], date("H:i:s"));
sleep($_GET['w']);
printf('finished at %s', date("H:i:s"));
?>
The bash script returns the following:
$ ./multiThread.sh
Process 1 (sleep 5) started at 15:12:59 finished at 15:12:04
Process 2 (sleep 2) started at 15:12:59 finished at 15:12:01
What I have tried so far in python 3:
#!/usr/bin/python3.2
import urllib.request, threading
def wget (address):
url = urllib.request.urlopen(address)
mybytes = url.read()
mystr = mybytes.decode("latin_1")
print(mystr)
url.close()
thread1 = threading.Thread(None, wget, None, ("http://ursule/test/test.php?p=1&w=5",))
thread2 = threading.Thread(None, wget, None, ("http://ursule/test/test.php?p=1&w=2",))
thread1.run()
thread2.run()
This doesn't work as expected as it returns:
$ ./c.py
Process 1 (sleep 5) started at 15:12:58 finished at 15:13:03
Process 1 (sleep 2) started at 15:13:03 finished at 15:13:05
Upvotes: 1
Views: 1292
Reputation: 3236
Following your advise I dived into the doc pages about multithreading and multiprocessing and, after having done a couple of benchmarks, I came to the conclusion that multiprocessing was better suited for the job. It scale up much better as the number of threads/processes increases. Another problem I was confronted with was how to store the results of all these processes. Using Queue.Queue did the trick. Here is the solution I came up with:
This snippet send concurrent http requests to my test rig that pauses for one second before sending the anwser back (see the php script above).
import urllib.request
# function wget arg(queue, adresse)
def wget (resultQueue, address):
url = urllib.request.urlopen(address)
mybytes = url.read()
url.close()
resultQueue.put(mybytes.decode("latin_1"))
numberOfProcesses = 20
from multiprocessing import Process, Queue
# initialisation
proc = []
results = []
resultQueue = Queue()
# creation of the processes and their result queue
for i in range(numberOfProcesses):
# The url just passes the process number (p) to the my testing web-server
proc.append(Process(target=wget, args=(resultQueue, "http://ursule/test/test.php?p="+str(i)+"&w=1",)))
proc[i].start()
# Wait for a process to terminate and get its result from the queue
for i in range(numberOfProcesses):
proc[i].join()
results.append(resultQueue.get())
# display results
for result in results:
print(result)
Upvotes: 0
Reputation: 3704
Instead of using threading, it would be nice to use multiprocessing module as each task in independent. You may like to read more about GIL (http://wiki.python.org/moin/GlobalInterpreterLock).
Upvotes: 1