kino.jom
kino.jom

Reputation: 81

How to resume reading a csv file?

import csv

with open('test.csv', 'r') as f:
   reader = csv.reader(f)
   for i in reader:
      print(i)

CSV

id,name
001,jane
002,winky
003,beli
...

So far the program will only read once the csv. The program will read from the first rows 001 if restart again. How could I resume the reading like example if the program stop reading at 002 then next start reading will be 003?

Upvotes: 0

Views: 1016

Answers (4)

martineau
martineau

Reputation: 123501

To do this, you'll need to continually save the current location in another file each time a row is read from the CSV file, which, of course, will add some overhead to processing it.

I think creating a Context Manager Type in conjunction with a with statement would be a very good approach to use to solve this and will allow the overhead to be minimized to some degree.

The code below implements a content manager for reading CSV files and allows the reading of it or them to be automatically resumed if it's interrupted before the whole file has been read (within the context of the with statement).

This is done by creating a separate "state" file to keep track of the last row successfully read. This file will be deleted if no exception occurs while the reading is happening, however, it that won't happen and it will remain if one does. Because of that, the next time the file is read, the existing state file will be detected and used to allow the reading to start where it previously left off.

Notably, since each resumable CSV reader is a separate object, you can create and use more than one at a time. The associated "state" file for each one remains open while the CSV file is being read, so doesn't need to be repeatedly opened and closed each time its contents are updated.

import csv
import os

class ResumableCSVReader:

    def __init__(self, filename):
        self.filename = filename
        self.state_filename = filename + '.state'
        self.csvfile = None
        self.statefile = None

    def __enter__(self):
        self.csvfile = open(self.filename, 'r', newline='')

        try:  # Open and read state file
            with open(self.state_filename, 'r', buffering=1) as statefile:
                self.start_row = int(statefile.read())

        except FileNotFoundError: # No existing state file.
            self.start_row = 0

        self.statefile = open(self.state_filename, 'w', buffering=1)

        return _CSVReaderContext(self)

    def __exit__(self, exc_type, exc_val, exc_tb):
        if self.csvfile:
            self.csvfile.close()
        if self.statefile:
            self.statefile.close()
            if not exc_type:  # No exception?
                os.remove(self.state_filename) # Delete state file.


class _CSVReaderContext:

    def __init__(self, resumable):
        self.resumable = resumable
        self.reader = csv.reader(self.resumable.csvfile)

        # Skip to start row.
        for _ in range(self.resumable.start_row):
            next(self.reader)

        self.current_row = self.resumable.start_row

    def __iter__(self):
        return self

    def __next__(self):
        self.current_row += 1
        row = next(self.reader)

        # Update state file.
        self.resumable.statefile.seek(0)
        self.resumable.statefile.write(str(self.current_row)+'\n')

        return row


if __name__ == '__main__':

    csv_filename = 'resumable_data.csv'

    # Read a few rows and raise an exception.
    try:
        with ResumableCSVReader(csv_filename) as resumable:
            for _ in range(2):
                print('row:', next(resumable))

            raise MemoryError('Forced')  # Cause exception.

    except MemoryError:
        pass  # Expected, suppress to allow test to keep running.

    # CSV file is now closed.

    # Resume reading where left-off and continue to end of file.
    print('\nResume reading\n')

    with ResumableCSVReader(csv_filename) as resumable:
        for row in resumable:
            print('row:', row)

    print('\ndone')

Output:

row: ['id', 'name']
row: ['001', 'jane']

Resume reading

row: ['002', 'winky']
row: ['003', 'beli']

done

Upvotes: 1

J_H
J_H

Reputation: 20540

Use the magic of generators:

def get_rows(infile='test.csv'):
    with open(infile) as f:
        reader = csv.reader(f)
        for row in reader:
            yield row

for id, name in get_rows():
    out = some_complex_business_logic(id, name)
    print(out)

The generator will pause while you run your complex business logic, and then transparently resume when you're ready for the next row.

Upvotes: 0

Omi
Omi

Reputation: 121

In this case you have to explicitly save the current location each time which may be a little bit computationally expensive, but it works and here is the code :

import csv


def update_last(x):
    with open('last.txt', 'w') as file:
        file.write(str(x))


def get_last():
    try:
        with open('last.txt', 'r') as file:
            return int(file.read().strip())
    except:
        with open('last.txt', 'w') as file:
            file.write('0')
            return 0

with open('your_file.txt', 'r') as f:
    reader = csv.reader(f)
    last = get_last() + 1
    current = 1
    for i in reader:
        if current < last:
            current += 1
            continue
        print(i)
        current += 1
        update_last(current)

Upvotes: 0

vishu
vishu

Reputation: 23

For this you need to keep track of how far you've read the file as of now, file.tell() may come in handy. Afterwards you can start reading your file from here on using file.seek(). The code would look somewhat like:

def read_from_position(last_position):
  file = open("file_location")
  file.seek(last_position)
  file.readline() # Do what you want with this
  return file.tell() # this is the updated last position

You can achieve the same in your code by keeping a track of how many lines you last read and iterating post that many number of lines.

Upvotes: 0

Related Questions