Reputation: 15
I am using pd.read_excel
to read large excel files. How do I add in a GUI progress bar to inform users the progress while the system is reading the file?
I came across tqdm and PySimpleGui as the most commonly used progress bar. PySimpleGUI OneLineProgress Meter seems to be the solution to my need, but I'm too novice to add pd.read_excel
in. Can anyone teach me how to work this out? Thank you
for i in range(1000): # this is your "work loop" that you want to monitor
sg.OneLineProgressMeter('One Line Meter Example', i + 1, 1000, 'key')
Upvotes: 0
Views: 1016
Reputation: 4322
It's not really easy to do with read_excel
due to the function lacking any means of tracking progress, unlike read_csv
where you can use chunksize
parameter to return an interator and then you can update your progress based on how many chunks you've loaded out of the total.
This won't work in case of excel files so your only option is to use a combination of nrows
and skiprows
. To do this we first need to find out how many rows the file has. There are some neat ways of doing this with little overhead (see answers here) but these won't work for and Excel file so what you do is load one entire column of the sheet and find out the length that way:
# Get the total number of rows
df_temp = pd.read_excel(sheet_name='balbla', usecols=[0])
rows = df_temp.shape[0]
# Now load the file in chunks of 1000 rows at a time
chunks = rows//1000 + 1
chunk_list = []
for i in range(chunks):
tmp = pd.read_excel(sheet_name='blabla', nrows=1000, skiprows=[k for k in range(i*1000)])
chunk_list.append(chunks)
# Update progress
sg.OneLineProgressMeter('Loading excel file', i + 1, chunks)
df = pd.concat((f for f in chunk_list), axis=0)
This should work but will come with significant overhead. Unfortunately, there's no easy way to do this for excel files and you're better off using some other data format.
Upvotes: 2