xozxan
xozxan

Reputation: 21

python threads - please explain

I'm trying to understand how threads work in python and I'm slightly confused around a few things.

I was under the impression that you could run different tasks at the same time in parallel when using threads?

The code below will demonstrate some of the issues I'm experiencing with threads. (I know there are better ways writing a port scanner but this will clarify the issues I'm having )

============ Start Example ============

    import socket
    import threading

    def test2(start,end):

         for i in range(int(start),int(end)):
             server = 'localhost'
             s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
             try:
                 s.connect((server,i))
                 print("port", i , "is open", threading.current_thread().name)
                 s.close()
             except socket.error as e:
                 s.close()
     def threader():
         print(threading.current_thread().name)

         if threading.current_thread().name == "Thread-0":
              test2(1,1000)

         elif threading.current_thread().name == "Thread-1":
             test2(1000,2000)

         elif threading.current_thread().name == "Thread-2":
             test2(2000,3000)

         elif threading.current_thread().name == "Thread-3":
             test2(3000,4000)

     for i in range(4):
         t = threading.Thread(target=threader, name= "Thread-"+str(i))
         t.start()

============ End Example ============

if I would scan 1000 ports with 1 thread it usually takes around 7-8 sec.

The problem with the code above is that this takes around 30 sec to execute.

Should it not take around 7-8 sec to execute if all threads are running in parallel and are scanning the same amount of ports?

I'd really appreciate if someone could explain what I'm doing wrong here.

TIA!

Upvotes: 2

Views: 96

Answers (1)

mpolednik
mpolednik

Reputation: 1023

One thing to consider is CPython's implementation of threading. There is no real parallelism in Python due to so called Global Interpreter Lock - GIL (you can find more information at https://wiki.python.org/moin/GlobalInterpreterLock).

This means that running more threads on a task can actually have worse performance outcome due to need for context switching between these threads and their serial runtime.

If you want a real speedup, you can either use different Python implementation with support for parallel processing (such as Jython) or look into multiprocessing module.

I've modified your code to give an example of the running time for various implementations:

2.47829794884 # serial port scan
2.47367095947 # threaded port scan
0.693325996399 # port scan using multiprocessing

The results are from Fedora 20, 4 core CPU laptop scanning 40000 ports (or 10000 ports per thread/process.

Upvotes: 4

Related Questions