user1427661
user1427661

Reputation: 11774

Python Progress Bar - Is Threading the Answer Here?

I've done some research on progress bars in Python, and a lot of the solutions seem to be based on work being divided into known, discrete chunks. I.e., iterating a known number of times and updating the progress bar with stdout every time a percentage point of the progress toward the end of the iterations is made.

My problem is a little less discrete. It involves walking a user directory that contains hundreds of sub-directories, gathering MP3 information, and entering it into a database. I could probably count the number of MP3 files in the directory before iteration and use that as a guideline for discrete chunks, but many of the mp3s may already be in the database, some of the files will take longer to read than others, errors will occur and have to be handled in some cases, etc. Besides, I'd like to know how to pull this off with non-discrete chunks for future reference. Here is the code for my directory-walk/database-update, if you're interested:

import mutagen
import sys
import os
import sqlite3 as lite
for root, dirs, files in os.walk(startDir):

    for file in files:
        if isMP3(file):
            fullPath = os.path.join(root, file)

            # Check if path already exists in DB, skip iteration if so
            if unicode(fullPath, errors="replace") in pathDict:
                continue

            try:
                audio = MP3(fullPath)
            except mutagen.mp3.HeaderNotFoundError: # Invalid file/ID3 info
                #TODO: log for user to look up what files were not visitable
                continue
            # Do database operations and error handling therein. 

Is threading the best way to approach something like this? And if so, are there any good examples on how threading achieves this? I don't want a module for this because (a) it seems like something I should know how to do and (b) I'm developing for a dependency-lite situation.

Upvotes: 3

Views: 1072

Answers (1)

freakish
freakish

Reputation: 56467

If you don't know how many steps are in front of you, then how can you get a progress? That's the first thing. You have to count all of them before starting the job.

Now even if tasks differ in terms of needed time to finish you should not worry about that. Think about games. Sometimes when you see progress bars they seem to stop in one point and then jump very fast. This is exactly what's happening under the hood: some tasks take longer then others. But it's not a big deal ( unless the task is really long, like minutes maybe? ).

Of course you can use threads. It might be quite simple actually with Queue and ThreadPool. Run for example 20 threads and build a Queue of jobs. Your progress would then be number of items in Queue with initial length of Queue as a limit. This seems like a good design.

Upvotes: 3

Related Questions