morbyosef
morbyosef

Reputation: 113

Python MemoryError trying to split large string

I'm getting a memory error when trying to split a very big string.

data = load_data(file_name) # loads data string from file
splited_data = data.split('\n\n')

why is it and how it can be fixed? working with python 2.7

Upvotes: 1

Views: 1380

Answers (2)

BoarGules
BoarGules

Reputation: 16941

The function load_data is reading the entire file into memory and it is clear you don't have enough memory to do that. So you will have to abandon the idea of having a read phase followed by a processing phase. Instead, read your file a line at a time, and process the lines as you get them.

This will split your file into strings in the same way as data.split('\n\n') but one line at a time:

with open("mybigfile.txt", "r") as f:
    for line in f:
        mydata = line.rstrip()
        if mydata:
            do_something_with(mydata)

Upvotes: 2

snakecharmerb
snakecharmerb

Reputation: 55884

If you are processing the parts of the string one by one you can use a generator to emit each part separately; this will reduce the amount of memory used because you won't generate a list of all the parts, as you do with str.split.

>>> s = 'abc\n\ndef\n\nghi'

>>> def splitter(s):
...     chars = []
...     for x in s:
...         chars.append(x)
...         # Check for split characters and yield string
...         if chars[-2:] == ['\n', '\n']:
...             yield ''.join(chars[:-2])
...             chars = []
...     yield ''.join(chars)
... 
>>> 
>>> for word in splitter(s):
...     print word
... 
abc
def
ghi

Upvotes: 0

Related Questions