Gere
Gere

Reputation: 12707

Iterate over large file with progress indicator in Python?

I'm iterating over a large csv file and I'd like to print out some progress indicator. As I understand counting the number of lines would requires parsing all of the file for newline characters. So I cannot easily estimate progress with line number.

Is there anything else I can do to estimate the progress while reading in lines? Maybe I can go by size?

Upvotes: 11

Views: 17431

Answers (5)

Piotr Czapla
Piotr Czapla

Reputation: 26552

You can use tqdm with large files in the following way:

import os
import tqdm

with tqdm.tqdm(total=os.path.getsize(filename)) as pbar:
   with open(filename, "rb") as f:
      for l in f:
          pbar.update(len(l))
          ...

If you read a utf-8 file then your len(l) won't give you the exact number of bytes but it should be good enough.

Upvotes: 23

YohanK
YohanK

Reputation: 495

This is based on the @Piotr's answer for Python3

import os
import tqdm

with tqdm(total=os.path.getsize(filepath)) as pbar:
    with open(filepath) as file:
        for line in file:
            pbar.update(len(line.encode('utf-8')))
            ....
        file.close()

Upvotes: 6

dmralev
dmralev

Reputation: 85

Please check this small (and useful) library named tqdm https://github.com/noamraph/tqdm You just wrap an iterator and cool progress meter shows as the loop executes.

The image says it all.

enter image description here

Upvotes: 6

Adel Ahmadyan
Adel Ahmadyan

Reputation: 174

You can use os.path.getsize (or os.stat) to get the size of your text file. Then whenever you parse a new line, compute the size of that line in bytes and use it as an indicator.

import os
fileName = r"c:\\somefile.log"
fileSize = os.path.getsize(fileName)

progress = 0
with open(fileName, 'r') as inputFile:
    for line in inputFile:
        progress = progress + len(line)
        progressPercent = (1.0*progress)/fileSize

#in the end, progress == fileSize

Upvotes: 6

Saimadhav Heblikar
Saimadhav Heblikar

Reputation: 712

You can use os.path.getsize(filename) to get the size of your target file. Then as you read data from the file, you can calculate progress percentage using a simple formula currentBytesRead/filesize*100%. This calculation can be done at the end of every N lines.

For the actual progress bar, you take a look at Text Progress Bar in the Console

Upvotes: 8

Related Questions