Reputation: 1684
Let's say I have a list:
test_list = [2, 5, 3, 6]
number_of_elements = len(test_list)
Then enumerate
can be used with number_of_elements
to track the progress of a loop as follows:
for j, element in enumerate(test_list):
do something
print('completed {} out of {}'.format(j, number_of_elements))
Large csv files can be read as shown below (reference answer):
chunksize = 10 ** 6
for chunk in pd.read_csv(filename, chunksize=chunksize):
process(chunk)
How to track the progress of this loop?
file_chunks = pd.read_csv(file_name, chunksize=100000)
number_of_chunks = len(file_chunks)
for j, chunk in enumerate(pd.read_csv(file_name, chunksize=100000)):
print(j, number_of_chunks)
Following is the error:
TypeError: object of type 'TextFileReader' has no len()
Upvotes: 0
Views: 194
Reputation: 1018
You almost have it, the only problem is that there is no easy way for len
to know how big the file is before reading it.
If you did:
file_chunks = pd.read_csv(file_name, chunksize=100000)
for i, chunk in enumerate(file_chunks):
print(i)
That would work.
Also, this is a great use case for Dask
(a python library that imitates a lot of pandas for big files)
Upvotes: 1