Reputation: 256
I am learning about threading in Python, and wrote a short test program which creates 10 csv-files and writes 100k lines in each of the files. I assumed it would be faster to let 10 threads write each their own file, but for some reason it is 2x slower than simply writing all files in sequence.
I think this might have to do with the way the threading is treated by the OS, but not sure. I am running this on Linux.
I would greatly appreciate if someone could shed some light on why this is the case.
Multi-thread version:
import thread, csv
N = 10 #number of threads
exitmutexes = [False]*N
def filewriter(id_):
with open('files/'+str(id_)+'.csv', 'wb') as f:
writer = csv.writer(f, delimiter=',')
for i in xrange(100000):
writer.writerow(["efweef", "wefwef", "666w6efw", "6555555"])
exitmutexes[id_] = True
for i in range(N):
thread.start_new_thread(filewriter, (i,))
while False in exitmutexes: #checks whether all threads are done
pass
Note: I have tried to include a sleep in the while-loop so that main thread is free at intervals, but this had no substantial effect.
Regular version:
import time, csv
for i in range(10):
with open('files2/'+str(i)+'.csv', 'wb') as f:
writer = csv.writer(f, delimiter=',')
for i in xrange(100000):
writer.writerow(["efweef", "wefwef", "666w6efw", "6555555"])
Upvotes: 0
Views: 573
Reputation: 64827
There are several issues:
If you use more threads than available CPU, the total run time always increases or at most stay the same. The only reason to use more threads than CPU cores is if you are consuming the result of the threads interactively or in a pipeline with other systems. There are edge cases where you can speed up a poorly designed, I/O bound program by using threads. But a well designed single thread program will most likely perform just as well or better.
Upvotes: 4
Reputation: 910
Sounds like the dreaded GIL (Global Interpreter Lock)
"In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once. This lock is necessary mainly because CPython's memory management is not thread-safe. (However, since the GIL exists, other features have grown to depend on the guarantees that it enforces.)"
This essentially means each python interpreter (and thus script) is locked to one logical core on your machine, and no two threads will be executed simultaneously, unless you decide to spawn to separate processes.
Consult this page for more details: https://wiki.python.org/moin/GlobalInterpreterLock
Upvotes: 1