Threading serializable operations in python is slower than running them sequentially

Question

I am attempting to have two long running operations run simultaneously in python. They both operate on the same data set, but do not modify it. I have found that a threaded implementation runs slower than simply running them one after the other.

I have created a simplified example to show what I am experiencing.

Running this code, and commenting line 46 (causing it to perform the operation threaded), results in a runtime on my machine of around 1:01 (minute:seconds). I see two CPUs run at around 50% for the full run time.

Commenting out line 47 (causing sequential calculations) results in a runtime of around 35 seconds, with 1 CPU being pegged at 100% for the full runtime.
Both runs result in the both full calculations being completed.

from datetime import datetime
import threading


class num:
    def __init__(self):
        self._num = 0

    def increment(self):
        self._num += 1

    def getValue(self):
        return self._num

class incrementNumber(threading.Thread):
    def __init__(self, number):
        self._number = number
        threading.Thread.__init__(self)

    def run(self):
        self.incrementProcess()

    def incrementProcess(self):
        for i in range(50000000):
            self._number.increment()


def runThreaded(x, y):
    x.start()
    y.start()
    x.join()
    y.join()

def runNonThreaded(x, y):
    x.incrementProcess()
    y.incrementProcess()

def main():
    t = datetime.now()

    x = num()
    y = num()
    incrementX = incrementNumber(x)
    incrementY = incrementNumber(y)

    runThreaded(incrementX, incrementY)
    #runNonThreaded(incrementX, incrementY)


    print x.getValue(), y.getValue()
    print datetime.now() - t


if __name__=="__main__":
    main()

Danica · Accepted Answer

CPython has a so-called Global Interpreter Lock, which means that only one Python statement can run at a time even when multithreading. You might want to look into multiprocessing, which avoids this constraint.

The GIL means that Python multithreading is only useful for I/O-bound operations, other things that wait for stuff to happen, or if you're calling a C extension that releases the GIL while doing work.

Threading serializable operations in python is slower than running them sequentially

Answers (1)

Related Questions