Reputation: 81
import csv
with open('test.csv', 'r') as f:
reader = csv.reader(f)
for i in reader:
print(i)
CSV
id,name
001,jane
002,winky
003,beli
...
So far the program will only read once the csv. The program will read from the first rows 001
if restart again. How could I resume the reading like example if the program stop reading at 002
then next start reading will be 003
?
Upvotes: 0
Views: 1016
Reputation: 123501
To do this, you'll need to continually save the current location in another file each time a row is read from the CSV file, which, of course, will add some overhead to processing it.
I think creating a Context Manager Type in conjunction with a with
statement would be a very good approach to use to solve this and will allow the overhead to be minimized to some degree.
The code below implements a content manager for reading CSV files and allows the reading of it or them to be automatically resumed if it's interrupted before the whole file has been read (within the context of the with
statement).
This is done by creating a separate "state" file to keep track of the last row successfully read. This file will be deleted if no exception occurs while the reading is happening, however, it that won't happen and it will remain if one does. Because of that, the next time the file is read, the existing state file will be detected and used to allow the reading to start where it previously left off.
Notably, since each resumable CSV reader is a separate object, you can create and use more than one at a time. The associated "state" file for each one remains open while the CSV file is being read, so doesn't need to be repeatedly opened and closed each time its contents are updated.
import csv
import os
class ResumableCSVReader:
def __init__(self, filename):
self.filename = filename
self.state_filename = filename + '.state'
self.csvfile = None
self.statefile = None
def __enter__(self):
self.csvfile = open(self.filename, 'r', newline='')
try: # Open and read state file
with open(self.state_filename, 'r', buffering=1) as statefile:
self.start_row = int(statefile.read())
except FileNotFoundError: # No existing state file.
self.start_row = 0
self.statefile = open(self.state_filename, 'w', buffering=1)
return _CSVReaderContext(self)
def __exit__(self, exc_type, exc_val, exc_tb):
if self.csvfile:
self.csvfile.close()
if self.statefile:
self.statefile.close()
if not exc_type: # No exception?
os.remove(self.state_filename) # Delete state file.
class _CSVReaderContext:
def __init__(self, resumable):
self.resumable = resumable
self.reader = csv.reader(self.resumable.csvfile)
# Skip to start row.
for _ in range(self.resumable.start_row):
next(self.reader)
self.current_row = self.resumable.start_row
def __iter__(self):
return self
def __next__(self):
self.current_row += 1
row = next(self.reader)
# Update state file.
self.resumable.statefile.seek(0)
self.resumable.statefile.write(str(self.current_row)+'\n')
return row
if __name__ == '__main__':
csv_filename = 'resumable_data.csv'
# Read a few rows and raise an exception.
try:
with ResumableCSVReader(csv_filename) as resumable:
for _ in range(2):
print('row:', next(resumable))
raise MemoryError('Forced') # Cause exception.
except MemoryError:
pass # Expected, suppress to allow test to keep running.
# CSV file is now closed.
# Resume reading where left-off and continue to end of file.
print('\nResume reading\n')
with ResumableCSVReader(csv_filename) as resumable:
for row in resumable:
print('row:', row)
print('\ndone')
Output:
row: ['id', 'name']
row: ['001', 'jane']
Resume reading
row: ['002', 'winky']
row: ['003', 'beli']
done
Upvotes: 1
Reputation: 20540
Use the magic of generators:
def get_rows(infile='test.csv'):
with open(infile) as f:
reader = csv.reader(f)
for row in reader:
yield row
for id, name in get_rows():
out = some_complex_business_logic(id, name)
print(out)
The generator will pause while you run your complex business logic, and then transparently resume when you're ready for the next row.
Upvotes: 0
Reputation: 121
In this case you have to explicitly save the current location each time which may be a little bit computationally expensive, but it works and here is the code :
import csv
def update_last(x):
with open('last.txt', 'w') as file:
file.write(str(x))
def get_last():
try:
with open('last.txt', 'r') as file:
return int(file.read().strip())
except:
with open('last.txt', 'w') as file:
file.write('0')
return 0
with open('your_file.txt', 'r') as f:
reader = csv.reader(f)
last = get_last() + 1
current = 1
for i in reader:
if current < last:
current += 1
continue
print(i)
current += 1
update_last(current)
Upvotes: 0
Reputation: 23
For this you need to keep track of how far you've read the file as of now, file.tell()
may come in handy. Afterwards you can start reading your file from here on using file.seek()
.
The code would look somewhat like:
def read_from_position(last_position):
file = open("file_location")
file.seek(last_position)
file.readline() # Do what you want with this
return file.tell() # this is the updated last position
You can achieve the same in your code by keeping a track of how many lines you last read and iterating post that many number of lines.
Upvotes: 0