An accurate progress bar for loading files and transforming data using Vaex and Pandas

I am looking for the method to include a progress bar to see the remaining time for loading a file with Vaex (big data files) or transform big data with Panda. I have checked this thread https://stackoverflow.com/questions/3160699/python-progress-bar, but unfortunately, all the progress bar codes are absolutely inaccurate for my needs because the command or the code already finished before the progress bar was complete (absolutely fail). I am looking for something similar to %time in which the time spent by a line, or a command, is printed out. In my case I want to see the estimation time and the progress bar for any command without using a for-loop.

Here is my code:

from progress.bar import Bar

with Bar('Processing', max=1) as bar:
        %time sample_tolls_amount=df_panda_tolls.sample(n = 4999);
        bar.next()
        
Processing |################################| 1/1CPU times: total: 11.1 s
Wall time: 11.1 s

The for loop is unneccesary because I need to run this command once. Actually, with the for loop, the progress bar was still running when the data (sample_tolls_amount) was done (in the case of max=20). Is there any way to check feasibly the progress of any command? Just like &time does.

I have tried several functions but all of them fail to show the real progress of the command. I don't have for loops. I have commands to load or trandform big data files. Therefore, I want to know the progress done and the remaining time every time I run a code with my commands. Just like dowloading a file from the browser: you see how many Gb has been dowloaded and how much data remain to download. I am looking for something easy to apply. Easy like %time (%progress).

Upvotes: 1

Views: 185

Answers (1)

darren
darren

Reputation: 5774

i use these two progress bar variants that do not require imports and one can embed into the code quite easily.

simple progress bar:

import time


n = 25
for i in range(n):
    time.sleep(0.1)
    progress = int(i / n * 50)
    print(f'running {i+1} of {n} {progress*"."}', end='\r', flush=True)

more elaborate progress bar:

import time

def print_progressbar(total, current, barsize=60):
    progress = int(current*barsize/total)
    completed = str(int(current*100/total)) + '%'
    print('[', chr(9608)*progress, ' ', completed, '.'*(barsize-progress), '] ', str(current)+'/'+str(total), sep='', end='\r', flush=True)



total = 600
barsize = 60
print_frequency = max(min(total//barsize, 100), 1)
print("Start Task..")
for i in range(1, total+1):
    time.sleep(0.0001)
    if i%print_frequency == 0 or i == 1:
        print_progressbar(total, i, barsize)
print("\nFinished")

Upvotes: 0

Related Questions