Muthu Rg
Muthu Rg

Reputation: 662

Python multithreading performance

I've to process say thousands of records in an array. I did the normal for loop like this

for record in records:
        results = processFile(record)
        write_output_record(o, results)

The script above took 427.270612955 seconds!

As there is no dependancy between these records. I used Python multi threading module in a hope to speedup the process. below is my implementation

import multiprocessing
from multiprocessing.dummy import Pool as ThreadPool

pool = ThreadPool(processes=threads)
results = pool.map(processFile, records)
pool.close()
pool.join()
write_output(o, results)

My computer has 8 cpu's. And it takes 852.153398991 second. Can somebody help me as in what am I doing wrong?

PS: processFile function has no i/o's. its mostly processing the records and sending back the update record

Upvotes: 0

Views: 110

Answers (1)

vishwarajanand
vishwarajanand

Reputation: 1071

Try using vmstat and verify whether its a memory issue. Sometimes, using multithreading can slow your system down if each thread pushes up the RAM usage by a significant amount.

Usually people encounter three types of issues: CPU bound (Constraint on CPU computations), Memory bound (Constraint on RAM) and I/O bound (Network & hard drive I/O constraints).

Upvotes: 1

Related Questions