Lijo Abraham
Lijo Abraham

Reputation: 881

how to resume read operation of file using python

I have a file sized 15-16GB containing json objects seperated by new line (\n).

I am new to python and reading the file using the following code.

with open(filename,'rb') as file:
  for data in file:  
    dosomething(data)

If while reading the reading ,my script fails after 5GB, how can I resume my read operation from the last read position and continue from there.

I am trying to do the same by using the file.tell() to get position and move the pointer using the seek() function.

Since this file contains json objects, after seek operation am getting the below error.

ValueError: No JSON object could be decoded

I am assuming that after seek operation the pointer is not getting proper json.

How can I solve this?. Is there any other way to read from last read position in python.

Upvotes: 3

Views: 1555

Answers (2)

EbraHim
EbraHim

Reputation: 2349

Use another file to store the current location:

cur_loc = open("location.txt", "w+")
cur_loc.write('0')
exception = False

i = 0

with open("test.txt","r") as f:
    while(True):
        i+=1
        if exception:
            cur_loc.seek(0)
            pos = int(cur_loc.readline())
            f.seek(pos)
            exception = False

        try:
            read = f.readline()
            print read,
            if i==5:
                print "Exception Happened while reading file!"
                x = 1/0 #to make an exception
            #remove above if block and do everything you want here.
            if read == '':
                break
        except:
            exception = True
            cur_loc.seek(0)
            cur_loc.write(str(f.tell()))

cur_loc.close()

Let assume we have the following text.txt as input file:

#contents of text.txt
1
2
3
4
5
6
7
8
9
10

When you run above program, you will have:

>>> ================================ RESTART ================================
>>> 
1
2
3
4
5
Exception Happened while reading file!
6
7
8
9
10 
>>> 

Upvotes: 2

Sven Hakvoort
Sven Hakvoort

Reputation: 3621

You can use for i, line in enumerate(opened_file) to get the line numbers and store this variable. when your script fails you can display this variable to the user. You will then need to make an optional command line argument for this variable. if the variable is given your script needs to do opened_file.readline() for i in range(variable). this way you will get to the point where you left.

for i in range(passed_variable):
    opened_file.readline()

Upvotes: 0

Related Questions