Red Sparrow
Red Sparrow

Reputation: 397

How to get progress bar with tqdm in a for loop over directory

I am trying to conditionally load some files from a directory. I would like to have a progress bar from tqdm on the process. I currently running this:

loaddir = r'D:\Folder'
# loop the files in the directory
print('Data load initiated')
for subdir, dirs, files in os.walk(loaddir_res):
    for name in tqdm(files):
        if name.startswith('Test'):
            #do things

which gives

Data load initiated

  0%|          | 0/6723 [00:00<?, ?it/s]
  0%|          | 26/6723 [00:00<00:28, 238.51it/s]
  1%|          | 47/6723 [00:00<00:31, 213.62it/s]
  1%|          | 72/6723 [00:00<00:30, 220.84it/s]
  1%|▏         | 91/6723 [00:00<00:31, 213.59it/s]
  2%|▏         | 115/6723 [00:00<00:30, 213.73it/s]

This has two problems:

  1. When progress is updated a new line appears in my IPython console in Spyder
  2. I am actually timing the loop over the files and not over the files that start with 'Test' and therefore progress and remaining time are not accurate.

However, if I try this:

loaddir = r'D:\Folder'
# loop the files in the directory
print('Data load initiated')
for subdir, dirs, files in os.walk(loaddir_res):
    for name in files:
        if tqdm(name.startswith('Test')):
            #do things

I get the following error.

Traceback (most recent call last):

  File "<ipython-input-80-b801165d4cdb>", line 21, in <module>
    if tqdm(name.startswith('Probe')):

TypeError: 'NoneType' object cannot be interpreted as an integer

I would like to have a progress bar in only one line that updates whenever the startswith loop is activated.

----UPDATE----

I also found out here that it can also be used like this:

files = [f for f in tqdm(files) if f.startswith('Test')]

Which allows to track progress with list comprehension by wrapping the iterable with tqdm. However in spyder this results in a separate line for each progress update.

----UPDATE2---- It actually works fine in spyder. Sometimes if the loop fails, it might go back to printing one line of progress update. But i haven't seen this very often after the latest updates.

Upvotes: 3

Views: 14380

Answers (3)

Elyasaf755
Elyasaf755

Reputation: 3539

Specify position=0 and leave=True like this:

for i in tqdm(range(10), position=0, leave=True):
    # Some code

Or in a list comprehension:

nums = [i for i in tqdm(range(10), position=0, leave=True)]

It's worth to mention that you can set `position=0` and `leave=True` to be the default settings, so you won't need to specify them each time, like this:
from tqdm import tqdm
from functools import partial

tqdm = partial(tqdm, position=0, leave=True) # this line does the magic

# for loop
for i in tqdm(range(10)):
    # Some code

# list comprehension
nums = [for i in tqdm(range(10))]

Upvotes: 0

casper.dcl
casper.dcl

Reputation: 14849

firstly the answer:

loaddir = r'D:\surfdrive\COMSOL files\Batch folder\Current batch simulation files'
# loop the files in the directory
print('Data load initiated')
for subdir, dirs, files in os.walk(loaddir_res):
    files = [f for f in files if f.startswith('Test')]
    for name in tqdm(files):
        #do things

This will work in any decent environment (including a bare terminal). The solution is to not give tqdm the unused filenames. You may find https://github.com/tqdm/tqdm/wiki/How-to-make-a-great-Progress-Bar insightful.

Secondly the issue with multiple lines output is well-known and due to some environments being broken (https://github.com/tqdm/tqdm#faq-and-known-issues) by not supporting carriage return (\r).

The correct links for this problem in Spyder are https://github.com/tqdm/tqdm/issues/512 and https://github.com/spyder-ide/spyder/issues/6172

Upvotes: 4

Carlos Cordoba
Carlos Cordoba

Reputation: 34186

(Spyder maintainer here) This is a known limitation of TQDM progress bars in Spyder. I'd recommend you to open an issue about it in its Github repository.

Upvotes: 0

Related Questions