Roman
Roman

Reputation: 2355

I/O slowdown with multithreading in python

I have a python script, which works on the following scheme: read a large file (e.g., movie) - compose selected information from it into a number of small temporary files - spawn in subprocesses a C++ application to perform the files processing/calculations (separately for each file) - read the application output. To speed up the script I used multiprocessing. However, it has major drawback: each process has to maintain in RAM the whole copy of the large input file, and therefore I can run only few processes, as I run out of memory. Thus I decided to try multithreading instead (or some combination of multiprocessing and multithreading) due to the fact that threads share the address space. As the python part most of the time works with file I/O or waits for the C++ application to complete, I thought that GIL must not be an issue here. Nevertheless, instead of some gain in performance I observe drastic slowdown, mainly owing to the I/O part.

I illustrate the problem with the following code (saved as test.py):

import sys, threading, tempfile, time

nthreads = int(sys.argv[1])

class IOThread (threading.Thread):
    def __init__(self, thread_id, obj):
        threading.Thread.__init__(self)
        self.thread_id = thread_id
        self.obj = obj
    def run(self):
        run_io(self.thread_id, self.obj)

def gen_object(nlines):
    obj = []
    for i in range(nlines):
        obj.append(str(i) + '\n')
    return obj

def run_io(thread_id, obj):
    ntasks = 100 // nthreads + (1 if thread_id < 100 % nthreads else 0)
    for i in range(ntasks):
        tmpfile = tempfile.NamedTemporaryFile('w+')
        with open(tmpfile.name, 'w') as ofile:
            for elem in obj:
                ofile.write(elem)
        with open(tmpfile.name, 'r') as ifile:
            content = ifile.readlines()
        tmpfile.close()

obj = gen_object(100000)
starttime = time.time()
threads = []
for thread_id in range(nthreads):
    threads.append(IOThread(thread_id, obj))
    threads[thread_id].start()
for thread in threads:
    thread.join()
runtime = time.time() - starttime
print('Runtime: {:.2f} s'.format(runtime))

When I run it with different number of threads, I get this:

$ python3 test.py 1
Runtime: 2.84 s
$ python3 test.py 1
Runtime: 2.77 s
$ python3 test.py 1
Runtime: 3.34 s
$ python3 test.py 2
Runtime: 6.54 s
$ python3 test.py 2
Runtime: 6.76 s
$ python3 test.py 2
Runtime: 6.33 s

Can someone explain me the result, as well as give some advice, how to effectively parallelize I/O using multithreading?

EDIT:

The slowdown is not due to HDD performance, because:

1) the files are getting cached to RAM anyway

2) the same operations with multiprocessing (not multithreading) are indeed getting faster (almost by factor of CPUs number)

Upvotes: 4

Views: 1408

Answers (2)

Roman
Roman

Reputation: 2355

As I delved deeper into the problem, I made comparison benchmarks for 4 different parallelisation methods, 3 of which are using python and 1 is using java (the purpose of the test was not to compare I/O machinery between different languages but to see if multithreading can boost I/O operations). The test was performed on Ubuntu 14.04.3, all files were placed to a RAM disk.

Although the data are quite noisy, the clear trend is evident (see the chart; n=5 for each bar, error bars represent SD): python multithreading fails to boost the I/O performance. The most probable reason is GIL, and therefore there is no way around it.

enter image description here

Upvotes: 3

Stephane Martin
Stephane Martin

Reputation: 1642

I think your performance measures don't lie: you're asking your hard disk to do many things at the same time. Reads, writes, fsync when closing the files, ... and on several files at the same time. It triggers a lot of hardware physical operations. And the more files you write at the same time, the more contention you get.

So the CPU is waiting for the disk operation to finish...

Moreover, maybe you don't have a SSD hard disk, so the syncs actually mean some physical moves.

EDIT: it could be a GIL problem. When you iterate elem in obj in run_io, you execute python code between each write. The ofile.write probably release the GIL, so that the IO doesnt block the other threads, but the lock is released/acquired with each iteration. So maybe your writes don't really run "concurrently".

EDIT2: to test the hypothesis you can try to replace:

for elem in obj:
    ofile.write(elem)

with:

ofile.write("".join(obj))

and see if perf gets better

Upvotes: -1

Related Questions