Reputation: 1621
i am trying to create a progressbar similar to tqdm's. Everything works just fine, but i noticed that the calculation for every step of the progressbar (for big iterables, len > 50) takes a lot of time. this is my code.
def progressbar(iterable):
def new(index):
#... print the progressbar
for i in range(len(iterable)):
new(i)
yield iterable[i]
the problem is that while on small iterables the time that new()
takes to execute is indifferent, on larger iterables it becomes a problem (which does not occur in the tqdm library). For example the following code takes a few seconds to execute. It should be instant!
iterator = progressbar(range(1000))
for i in iterator: pass
can you tell me a way to remedy this thing? maybe implementing multithreading?
Upvotes: 1
Views: 119
Reputation: 44108
It's not clear what the issue is (you are not showing all of your calculations), but I believe your approach can be improved with the way your progress bar is handling the iterable
it is being passed:
iterable
is indexable, which may not always be the case.len
function nor would converting the generator to a list to get its length be necessarily efficient and it would probably defeat the purpose of having a progress bar, as in the example below. Your interface should therefore allow the user to pass an optional total
parameter (as tqdm
does) to explicitly specify the length of the iterable
.new
so thatnew
can quickly calculate based on the value of the index
argument how wide the bar should be.I would suggest the following changes:
def progressbar(iterable, total=None):
def new(index):
#... print the progressbar
from math import floor
nonlocal division, width
n_division = floor(index / division + .5)
remainder = width - n_division
print('|', '.' * n_division, ' ' * remainder, '|', sep='', end='\r')
if total is None:
iterable = list(iterable)
# we must convert to a list
total = len(iterable)
it = iter(iterable)
width = 60 # with of progress bar
division = total / 60 # each division represents this many completions
try:
for i in range(total):
# ensure next value exists before printing it:
yield next(it)
new(i)
except StopIteration:
pass
print()
def fun():
import time
for i in range(1000):
time.sleep(.03)
yield i
iterator = progressbar(fun(), total=1000)
values = [i for i in iterator]
print(values[0], values[-1])
Multithreading
Incorporating multithreading as a way of speeding up processing is problematic. The following is a (naive) attempt to do so that fails because although multithreading is being used to get the values from the generator function fun
, the generator function is still generating values only once every .03 seconds. It's should also be clear that if the iterable
is, for example, a simple list that multithreading is not going to be able to iterate the list more quickly than using a single thread:
from multiprocessing.pool import ThreadPool
def progressbar(iterable, total=None):
def new(index):
#... print the progressbar
from math import floor
nonlocal division, width
n_division = floor(index / division + .5)
remainder = width - n_division
print('|', '.' * n_division, ' ' * remainder, '|', sep='', end='\r')
if total is None:
iterable = list(iterable)
# we must convert to a list
total = len(iterable)
it = iter(iterable)
width = 60 # with of progress bar
division = total / 60 # each division represents this many completions
with ThreadPool(20) as pool:
for i, result in enumerate(pool.imap(lambda x: x, iterable)):
yield result
new(i)
print()
def fun():
import time
for i in range(1000):
time.sleep(.03)
yield i
iterator = progressbar(fun(), total=1000)
values = [i for i in iterator]
print(values[0], values[-1])
What would have sped up processing would have been if the generator function itself had used multithreading. But, of course, one has no control over how the iterable
is being created:
from multiprocessing.pool import ThreadPool
def progressbar(iterable, total=None):
def new(index):
#... print the progressbar
from math import floor
nonlocal division, width
n_division = floor(index / division + .5)
remainder = width - n_division
print('|', '.' * n_division, ' ' * remainder, '|', sep='', end='\r')
if total is None:
iterable = list(iterable)
# we must convert to a list
total = len(iterable)
it = iter(iterable)
width = 60 # with of progress bar
division = total / 60 # each division represents this many completions
try:
for i in range(total):
# ensure next value exists before printing it:
yield next(it)
new(i)
except StopIteration:
pass
print()
def fun():
import time
def fun2(i):
time.sleep(.03)
return i
with ThreadPool(20) as pool:
for i in pool.imap(fun2, range(1000)):
yield i
iterator = progressbar(fun(), total=1000)
values = [i for i in iterator]
print(values[0], values[-1])
Upvotes: 2