Reputation: 425
There have been questions asked about memory errors in Python, but I want to ask one more specific to my situation. I am new to programming and Python.
When parsing a large text file (~8GB), the line
mylist = [line.strip('\n').split('|') for line in f]
resulted in "MemoryError".
I am running the 64-bit of Python [MSC v.1500 64 bit (AMD64)] on Windows XP 64-bit with 12GB of RAM. How can I handle this Memory Error other than installing more RAM?
Upvotes: 0
Views: 4498
Reputation: 250881
The memory error is coming because you're trying to store your whole file in a list(which is in memory). So, try to work on each line instead of storing it:
for line in f:
data = line.strip('\n').split('|')
#do something here with data
Upvotes: 5
Reputation: 142106
My take on it which using with
to make errors easier, a generator to define what lines
should look like, then iterates over that:
with open('somefile') as fin:
lines = (line.strip('\n').split('|') for line in fin)
for line in lines:
pass # do something with line
Upvotes: 1
Reputation:
You should definitely use a lazy generator to parse such a huge file one line at a time, or divide the file in smaller chunks.
One possibility:
def lazy_reader(path):
"""reads a file one line at a time."""
try:
file = open(path, 'r')
while True:
line = file.readline()
if not line: break
yield line # "outputs" the line from the generator
except IOError:
sys.stderr.write("error while opening file at %s\n" % path)
sys.exit(2)
finally:
file.close()
and then you can consume your generator like this
for line in lazy_reader("path/to/your/file"):
do_something_with(line)
EDIT: you can also combine generators in a neat "pipelined" way:
def parse(generator):
for line in generator: yield line.strip('\n').split('|')
for data in parse( lazy_reader("path/to/your/file") ):
do_something_with_splitted_array(data)
Upvotes: 1
Reputation: 1003
It depends what you want to do with your list.
If you want to work on a line-by-line basis, you can probably get the job done using an list generator instead of a list comprehension, which will look like this:
myiterator = (line.strip('\n').split('|') for line in f)
(not that I changed [...]
by (...)
). This will return an iterator instead of a list, and since for line in f
also doesn't create a list, you are going to load one line at a time.
If you want to work on all lines at once, you will probably have to combine this with another technique not to use all your memory.
Upvotes: 3