Giuppox
Giuppox

Reputation: 1621

implementing python multithreading into the calculation for a progressbar

i am trying to create a progressbar similar to tqdm's. Everything works just fine, but i noticed that the calculation for every step of the progressbar (for big iterables, len > 50) takes a lot of time. this is my code.

def progressbar(iterable):
    def new(index):
        #... print the progressbar
    for i in range(len(iterable)):
        new(i)
        yield iterable[i]

the problem is that while on small iterables the time that new() takes to execute is indifferent, on larger iterables it becomes a problem (which does not occur in the tqdm library). For example the following code takes a few seconds to execute. It should be instant!

iterator = progressbar(range(1000))
for i in iterator: pass

can you tell me a way to remedy this thing? maybe implementing multithreading?

Upvotes: 1

Views: 119

Answers (1)

Booboo
Booboo

Reputation: 44108

It's not clear what the issue is (you are not showing all of your calculations), but I believe your approach can be improved with the way your progress bar is handling the iterable it is being passed:

  1. First, you are assuming that the iterable is indexable, which may not always be the case.
  2. If it is a generator function, then the length may not be determinable with the len function nor would converting the generator to a list to get its length be necessarily efficient and it would probably defeat the purpose of having a progress bar, as in the example below. Your interface should therefore allow the user to pass an optional total parameter (as tqdm does) to explicitly specify the length of the iterable.
  3. You can do some upfront calculations outside of function new so thatnew can quickly calculate based on the value of the index argument how wide the bar should be.

I would suggest the following changes:

def progressbar(iterable, total=None):
    def new(index):
        #... print the progressbar
        from math import floor
        nonlocal division, width
        n_division = floor(index / division + .5)
        remainder = width - n_division
        print('|', '.' * n_division, ' ' * remainder, '|', sep='', end='\r')

    if total is None:
        iterable = list(iterable)
        # we must convert to a list
        total = len(iterable)
    it = iter(iterable)

    width = 60 # with of progress bar
    division = total / 60 # each division represents this many completions

    try:
        for i in range(total):
            # ensure next value exists before printing it:
            yield next(it)
            new(i)
    except StopIteration:
        pass
    print()

def fun():
    import time
    for i in range(1000):
        time.sleep(.03)
        yield i

iterator = progressbar(fun(), total=1000)
values = [i for i in iterator]
print(values[0], values[-1])

Multithreading

Incorporating multithreading as a way of speeding up processing is problematic. The following is a (naive) attempt to do so that fails because although multithreading is being used to get the values from the generator function fun, the generator function is still generating values only once every .03 seconds. It's should also be clear that if the iterable is, for example, a simple list that multithreading is not going to be able to iterate the list more quickly than using a single thread:

from multiprocessing.pool import ThreadPool


def progressbar(iterable, total=None):
    def new(index):
        #... print the progressbar
        from math import floor
        nonlocal division, width
        n_division = floor(index / division + .5)
        remainder = width - n_division
        print('|', '.' * n_division, ' ' * remainder, '|', sep='', end='\r')
    if total is None:
        iterable = list(iterable)
        # we must convert to a list
        total = len(iterable)
    it = iter(iterable)

    width = 60 # with of progress bar
    division = total / 60 # each division represents this many completions
    
    with ThreadPool(20) as pool:
        for i, result in enumerate(pool.imap(lambda x: x, iterable)):
            yield result
            new(i)
        print()

def fun():
    import time
    for i in range(1000):
        time.sleep(.03)
        yield i

iterator = progressbar(fun(), total=1000)
values = [i for i in iterator]
print(values[0], values[-1])

What would have sped up processing would have been if the generator function itself had used multithreading. But, of course, one has no control over how the iterable is being created:

from multiprocessing.pool import ThreadPool


def progressbar(iterable, total=None):
    def new(index):
        #... print the progressbar
        from math import floor
        nonlocal division, width
        n_division = floor(index / division + .5)
        remainder = width - n_division
        print('|', '.' * n_division, ' ' * remainder, '|', sep='', end='\r')

    if total is None:
        iterable = list(iterable)
        # we must convert to a list
        total = len(iterable)
    it = iter(iterable)

    width = 60 # with of progress bar
    division = total / 60 # each division represents this many completions

    try:
        for i in range(total):
            # ensure next value exists before printing it:
            yield next(it)
            new(i)
    except StopIteration:
        pass
    print()


def fun():
    import time

    def fun2(i):
        time.sleep(.03)
        return i

    with ThreadPool(20) as pool:
        for i in pool.imap(fun2, range(1000)):
            yield i

iterator = progressbar(fun(), total=1000)
values = [i for i in iterator]
print(values[0], values[-1])

Upvotes: 2

Related Questions