Reputation: 131
I'm iterating over a large group files inside a directory tree using the for loop.
While doing so, I want to monitor the progress through a progress bar in console. So, I decided to use tqdm for this purpose.
Currently, my code looks like this:
for dirPath, subdirList, fileList in tqdm(os.walk(target_dir)):
sleep(0.01)
dirName = dirPath.split(os.path.sep)[-1]
for fname in fileList:
*****
Output:
Scanning Directory....
43it [00:23, 11.24 it/s]
So, my problem is that it is not showing a progress bar. I want to know how to use it properly and get a better understanding of it working. Also, if there are any other alternatives to tqdm that can be used here.
Upvotes: 12
Views: 28667
Reputation: 388
You can have progress on all files inside a directory path with tqdm
this way.
from tqdm import tqdm
target_dir = os.path.join(os.getcwd(), "..Your path name")#it has 212 files
for r, d, f in os.walk(target_dir):
for file in tqdm(f, total=len(f)):
filepath = os.path.join(r, file)
#f'Your operation on file..{filepath}'
20%|████████████████████ | 42/212 [05:07<17:58, 6.35s/it]
Like this you will get progress...
Upvotes: 3
Reputation: 2967
Here is a more succinct way of precomputing the number of files and then providing a status bar on the files:
file_count = sum(len(files) for _, _, files in os.walk(folder)) # Get the number of files
with tqdm(total=file_count) as pbar: # Do tqdm this way
for root, dirs, files in os.walk(folder): # Walk the directory
for name in files:
pbar.update(1) # Increment the progress bar
# Process the file in the walk
Upvotes: 13
Reputation: 16610
As explained in the documentation, this is because you need to provide a progress indicator. Depending on what you do with your files, you can either use the files count or the files sizes.
Other answers suggested to convert the os.walk()
generator into a list, so that you get a __len__
property. However, this will cost you a lot of memory depending on the total number of files you have.
Another possibility is to precompute: you first walk once your whole file tree and count the total number of files (but without keeping the list of files, just the count!), then you can walk again and provide tqdm
with the files count you precomputed:
def walkdir(folder):
"""Walk through every files in a directory"""
for dirpath, dirs, files in os.walk(folder):
for filename in files:
yield os.path.abspath(os.path.join(dirpath, filename))
# Precomputing files count
filescount = 0
for _ in tqdm(walkdir(target_dir)):
filescount += 1
# Computing for real
for filepath in tqdm(walkdir(target_dir), total=filescount):
sleep(0.01)
# etc...
Notice that I defined a wrapper function over os.walkdir
: since you are working on files and not on directories, it's better to define a function that will progress on files rather than on directories.
However, you can get the same result without using the walkdir
wrapper, but it will be a bit more complicated as you have to resume the last progress bar state after each subfolder that gets traversed:
# Precomputing
filescount = 0
for dirPath, subdirList, fileList in tqdm(os.walk(target_dir)):
filescount += len(filesList)
# Computing for real
last_state = 0
for dirPath, subdirList, fileList in os.walk(target_dir):
sleep(0.01)
dirName = dirPath.split(os.path.sep)[-1]
for fname in tqdm(fileList, total=filescount, initial=last_state):
# do whatever you want here...
# Update last state to resume the progress bar
last_state += len(fileList)
Upvotes: 4
Reputation: 1
Here my solution to similar problem:
for root, dirs, files in os.walk(local_path):
path, dirs, files = os.walk(local_path).next()
count_files = (int(len(files)))
for i in tqdm.tqdm(range(count_files)):
time.sleep(0.1)
for fname in files:
full_fname = os.path.join(root, fname)
Upvotes: 0
Reputation: 44634
You can't show a percentage complete unless you know what "complete" means.
While os.walk
is running, it doesn't know how many files and folders it's going to end up iterating: the return type of os.walk
has no __len__
. It'd have to look all the way down the directory tree, enumerating all the files and folders, in order to count them. In other words, os.walk
would have to do all of its work twice in order to tell you how many items it's going to produce, which is inefficient.
If you're dead set on showing a progress bar, you could spool the data into an in-memory list: list(os.walk(target_dir))
. I don't recommend this. If you're traversing a large directory tree this could consume a lot of memory. Worse, if followlinks
is True
and you have a cyclic directory structure (with children linking to their parents), then it could end up looping forever until you run out of RAM.
Upvotes: 10
Reputation: 11837
It's because tqdm
doesn't know how long the result of os.walk
will be, because it's a generator so len
can't be called on it. You can fix this by converting os.walk(target_dir)
to a list first:
for dirPath, subdirList, fileList in tqdm(list(os.walk(target_dir))):
From the documentation of the tdqm
module:
len(iterable) is used if possible. As a last resort, only basic progress statistics are displayed (no ETA, no progressbar).
But, len(os.walk(target_dir))
isn't possible, so there is no ETA or progress bar.
As Benjamin pointed out, using list
does use some memory, but not too much. A spooled directory of ~190,000 files caused Python to use about 65MB of memory with this code on my Windows 10 machine.
Upvotes: 2