Reputation: 23
How can I implement multithreading to make this process faster? The program generates 1million random numbers and writes them on a file. It takes just over 2 seconds, but I'm wondering if multithreading would make it any faster.
import random
import time
startTime = time.time()
data = open("file2.txt", "a+")
for i in range(1000000):
number = str(random.randint(1, 9999))
data.write(number + '\n')
data.close()
executionTime = (time.time() - startTime)
print('Execution time in seconds: ', + str(executionTime))
Upvotes: 2
Views: 100
Reputation:
write the string @ once to the file rather than individually writing each number to file & I have tested it with multithreading
and its actually degrading performance since you are writing to same file if you do threading you have to also synchronize which will affect performance.
import time
startTime = time.time()
import random
size = 1000_000
# pre declaring the list to save time from resize it later
l = [None] * size
# l = list(str(random.randint(1, 99999)) for i in range(size))
# random.randrange(0, )
for i in range(size):
# l[i] = str(random.randint(1, 99999)) + "\n"
l[i] = f"{random.randint(1, 99999)}\n"
# writing data @ once to the file
with open('file2.txt', 'w+') as data:
data.write(''.join(l))
executionTime = (time.time() - startTime)
print('Execution time in seconds: ' + str(executionTime))
Execution time in seconds: 1.943817138671875
Upvotes: 0
Reputation: 52337
The short answer: Not easily.
Here is an example of using a multiprocessing pool to speed up your code:
import random
import time
from multiprocessing import Pool
startTime = time.time()
def f(_):
number = str(random.randint(1, 9999))
data.write(number + '\n')
data = open("file2.txt", "a+")
with Pool() as p:
p.map(f, range(1000000))
data.close()
executionTime = (time.time() - startTime)
print(f'Execution time in seconds: {executionTime})')
Looks good? Wait! This is not a drop-in replacement as it lacks synchronization of the processes so not all 1000000 line will be written (some will be overwritten in the same buffer)! See Python multiprocessing safely writing to a file
So we need to separate computing the numbers (in parallel) from writing them (in serial). We can do this as follows:
import random
import time
from multiprocessing import Pool
startTime = time.time()
def f(_):
return str(random.randint(1, 9999))
with Pool() as p:
output = p.map(f, range(1000000))
with open("file2.txt", "a+") as data:
data.write('\n'.join(output) + '\n')
executionTime = (time.time() - startTime)
print(f'Execution time in seconds: {executionTime})')
With that fixed, note that this is not multithreading but uses multiple processes. You can change it to multithreading with a different pool object:
from multiprocessing.pool import ThreadPool as Pool
On my system, I get from 1 second to 0.35 seconds with the processing pool. With the ThreadPool however it takes up to 2 seconds!
The reason is that Python's global interpreter lock prevents from multiple threads to process your task efficiently, see What is the global interpreter lock (GIL) in CPython?
To conclude, multithreading is not always the right answer:
The upside: Yes, with multiprocessing instead of multithreading I got a 3x speedup on my system.
Upvotes: 2