YashaswiniPrathivadi
YashaswiniPrathivadi

Reputation: 43

MemoryError while reading from a large file (~2.5GB) and storing into a python list

I am trying to process lines in a really huge file using python. I found out the best ways to read a large file, from the many previously answered questions here in stackoverflow. I picked up one of the ways and checked it as in the following :

fIn = fopen(fileName,'rU')
fOut = fopen(fileName1, 'w')
while 1:
    lines = fIn.readlines(100000)
    if not lines:
        break
    for line in lines :
            fOut.write(line)

This worked like magic and I was able to successfully read lines from one file and write it to another and did not encounter any MemoryErrors.

But what I now want to do is instead of writing the lines read form one file into another file, I want to store them in a list and then do my further processing on the list. My code to store the lines in a list is shown below :

fIn = fopen(fileName,'rU')
fOut = fopen(fileName1, 'w')
d = []
while 1:
    lines = fIn.readlines(100000)
    if not lines:
        break
    for line in lines :
            d.append(line)

This code is creating MemoryError and the stack trace printed on the prompt shows that the last line executed before this error is the d.append(line) line. So definitely writing large amounts of data into a list is causing error. This error is caused a few seconds into the program. So it is able to write data upto a certain size properly and then finding some memory fault.

I wanted to know what is the best way to store huge files in lists within python and hence not encounter the MemoryError fault.

Upvotes: 4

Views: 6781

Answers (1)

thefourtheye
thefourtheye

Reputation: 239443

Since processing can be done line by line, the best choice would be to iterate over the file object like this

with open(fileName, 'rU') as fIn:
    for line in fIn:
        process_line(line)

and move all the processing logic to the process_line function. This will be best choice because, it gets you only one line at the time. So, you are not clogging the memory.

Upvotes: 4

Related Questions