user1839897
user1839897

Reputation: 425

Memory Error When Parsing Large File - Python

There have been questions asked about memory errors in Python, but I want to ask one more specific to my situation. I am new to programming and Python.

When parsing a large text file (~8GB), the line

mylist = [line.strip('\n').split('|') for line in f]

resulted in "MemoryError".

I am running the 64-bit of Python [MSC v.1500 64 bit (AMD64)] on Windows XP 64-bit with 12GB of RAM. How can I handle this Memory Error other than installing more RAM?

Upvotes: 0

Views: 4498

Answers (4)

Ashwini Chaudhary
Ashwini Chaudhary

Reputation: 250881

The memory error is coming because you're trying to store your whole file in a list(which is in memory). So, try to work on each line instead of storing it:

for line in f:
   data = line.strip('\n').split('|')
   #do something here with data

Upvotes: 5

Jon Clements
Jon Clements

Reputation: 142106

My take on it which using with to make errors easier, a generator to define what lines should look like, then iterates over that:

with open('somefile') as fin:
    lines = (line.strip('\n').split('|') for line in fin)
    for line in lines:
        pass # do something with line

Upvotes: 1

user1835027
user1835027

Reputation:

You should definitely use a lazy generator to parse such a huge file one line at a time, or divide the file in smaller chunks.

One possibility:

def lazy_reader(path):
    """reads a file one line at a time."""
    try:
        file = open(path, 'r')
        while True:
            line = file.readline()
            if not line: break
            yield line             # "outputs" the line from the generator
    except IOError:
        sys.stderr.write("error while opening file at %s\n" % path)
        sys.exit(2)
    finally:
        file.close()

and then you can consume your generator like this

for line in lazy_reader("path/to/your/file"):
    do_something_with(line)

EDIT: you can also combine generators in a neat "pipelined" way:

def parse(generator):
    for line in generator: yield line.strip('\n').split('|')

for data in parse( lazy_reader("path/to/your/file") ):
    do_something_with_splitted_array(data)

Upvotes: 1

Jonathan Ballet
Jonathan Ballet

Reputation: 1003

It depends what you want to do with your list.

If you want to work on a line-by-line basis, you can probably get the job done using an list generator instead of a list comprehension, which will look like this:

myiterator = (line.strip('\n').split('|') for line in f)

(not that I changed [...] by (...)). This will return an iterator instead of a list, and since for line in f also doesn't create a list, you are going to load one line at a time.

If you want to work on all lines at once, you will probably have to combine this with another technique not to use all your memory.

Upvotes: 3

Related Questions