ripat
ripat

Reputation: 3236

Simple Multithreading in Python

I am new to python and try to execute two tasks simultanousely. These tasks are just fetching pages on a web server and one can terminate before the other. I want to display the result only when all requests are served. Easy in linux shell but I get nowhere with python and all the howto's I read look like black magic to a beginner like me. They all look over complicated to me compared with the simplicity of the bash script below.

Here is the bash script I would like to emulate in python:

# First request (in background). Result stored in file /tmp/p1
wget -q -O /tmp/p1 "http://ursule/test/test.php?p=1&w=5" &
PID_1=$!

# Second request. Result stored in file /tmp/p2
wget -q -O /tmp/p2 "http://ursule/test/test.php?p=2&w=2"
PID_2=$!

# Wait for the two processes to terminate before displaying the result
wait $PID_1 && wait $PID_2 && cat /tmp/p1 /tmp/p2

The test.php script is a simple:

<?php
printf('Process %s (sleep %s) started at %s ', $_GET['p'], $_GET['w'], date("H:i:s"));
sleep($_GET['w']);
printf('finished at %s', date("H:i:s"));
?>

The bash script returns the following:

$ ./multiThread.sh
Process 1 (sleep 5) started at 15:12:59 finished at 15:12:04
Process 2 (sleep 2) started at 15:12:59 finished at 15:12:01

What I have tried so far in python 3:

#!/usr/bin/python3.2

import urllib.request, threading

def wget (address):
    url = urllib.request.urlopen(address)
    mybytes = url.read()
    mystr = mybytes.decode("latin_1")
    print(mystr)
    url.close()

thread1 = threading.Thread(None, wget, None, ("http://ursule/test/test.php?p=1&w=5",))
thread2 = threading.Thread(None, wget, None, ("http://ursule/test/test.php?p=1&w=2",))

thread1.run()
thread2.run()

This doesn't work as expected as it returns:

$ ./c.py 
Process 1 (sleep 5) started at 15:12:58 finished at 15:13:03
Process 1 (sleep 2) started at 15:13:03 finished at 15:13:05

Upvotes: 1

Views: 1292

Answers (2)

ripat
ripat

Reputation: 3236

Following your advise I dived into the doc pages about multithreading and multiprocessing and, after having done a couple of benchmarks, I came to the conclusion that multiprocessing was better suited for the job. It scale up much better as the number of threads/processes increases. Another problem I was confronted with was how to store the results of all these processes. Using Queue.Queue did the trick. Here is the solution I came up with:

This snippet send concurrent http requests to my test rig that pauses for one second before sending the anwser back (see the php script above).

import urllib.request

# function wget arg(queue, adresse)
def wget (resultQueue, address):
    url = urllib.request.urlopen(address)
    mybytes = url.read()
    url.close()
    resultQueue.put(mybytes.decode("latin_1"))

numberOfProcesses = 20

from multiprocessing import Process, Queue

# initialisation
proc = []
results = []
resultQueue = Queue()

# creation of the processes and their result queue
for i in range(numberOfProcesses):
    # The url just passes the process number (p) to the my testing web-server
    proc.append(Process(target=wget, args=(resultQueue, "http://ursule/test/test.php?p="+str(i)+"&w=1",)))
    proc[i].start()

# Wait for a process to terminate and get its result from the queue
for i in range(numberOfProcesses):
    proc[i].join()
    results.append(resultQueue.get())

# display results
for result in results:
    print(result)

Upvotes: 0

shantanoo
shantanoo

Reputation: 3704

Instead of using threading, it would be nice to use multiprocessing module as each task in independent. You may like to read more about GIL (http://wiki.python.org/moin/GlobalInterpreterLock).

Upvotes: 1

Related Questions