Reputation: 670
http://ls.pwd.io/2013/06/parallel-s3-uploads-using-boto-and-threads-in-python/
I tried the second solution mentioned in the link to upload the multiple files to s3. The code mentioned in this link doesn't call method "join" on the threads which means main program can get terminated even though the threads are running. Using this approach the overall program gets executed much faster but doesn't guaranteee if the files are uploaded correctly or not. Is it really true? I am more concerned about the main program finishing fast? What side effects can be there using this approach?
Upvotes: 1
Views: 9152
Reputation: 16174
just having a little play, and I see multiprocessing
takes a while to tear down a Pool, but otherwise not much in it
test code is:
from time import time, sleep
from multiprocessing.pool import Pool, ThreadPool
from threading import Thread
N_WORKER_JOBS = 10
def worker(x):
# print("working on", x)
sleep(0.1)
def mp_proc(fn, n):
start = time()
with Pool(N_WORKER_JOBS) as pool:
t1 = time() - start
pool.map(fn, range(n))
start = time()
t2 = time() - start
print(f'Pool creation took {t1*1000:.2f}ms, teardown {t2*1000:.2f}ms')
def mp_threads(fn, n):
start = time()
with ThreadPool(N_WORKER_JOBS) as pool:
t1 = time() - start
pool.map(fn, range(n))
start = time()
t2 = time() - start
print(f'ThreadPool creation took {t1*1000:.2f}ms, teardown {t2*1000:.2f}ms')
def threads(fn, n):
threads = []
for i in range(n):
t = Thread(target=fn, args=(i,))
threads.append(t)
t.start()
for t in threads:
t.join()
for test in [mp_proc, mp_threads, threads]:
times = []
for _ in range(7):
start = time()
test(worker, 10)
times.append(time() - start)
times = ', '.join(f'{t*1000:.2f}' for t in times)
print(f'{test.__name__} took {times}ms')
I get the following timings (Python 3.7.3, Linux 5.0.8):
mp_proc
~220msmp_threads
~200msthreads
~100mshowever the teardown times are all ~100ms, which brings everything mostly into line.
I've poked around with logging and in the source, and it seems to be due to _handle_workers
only checking every 100ms (it does status checks then sleeps for 0.1 seconds).
with this knowledge, I can change the code to sleep for 0.095 seconds, then everything is within 10% of each other. also, given that this is just once at pool tear down it's easy to arrange for this not to happen in an inner loop
Upvotes: 3