Reputation: 113
I'm getting a memory error when trying to split a very big string.
data = load_data(file_name) # loads data string from file
splited_data = data.split('\n\n')
why is it and how it can be fixed? working with python 2.7
Upvotes: 1
Views: 1380
Reputation: 16941
The function load_data
is reading the entire file into memory and it is clear you don't have enough memory to do that. So you will have to abandon the idea of having a read phase followed by a processing phase. Instead, read your file a line at a time, and process the lines as you get them.
This will split your file into strings in the same way as data.split('\n\n')
but one line at a time:
with open("mybigfile.txt", "r") as f:
for line in f:
mydata = line.rstrip()
if mydata:
do_something_with(mydata)
Upvotes: 2
Reputation: 55884
If you are processing the parts of the string one by one you can use a generator to emit each part separately; this will reduce the amount of memory used because you won't generate a list of all the parts, as you do with str.split
.
>>> s = 'abc\n\ndef\n\nghi'
>>> def splitter(s):
... chars = []
... for x in s:
... chars.append(x)
... # Check for split characters and yield string
... if chars[-2:] == ['\n', '\n']:
... yield ''.join(chars[:-2])
... chars = []
... yield ''.join(chars)
...
>>>
>>> for word in splitter(s):
... print word
...
abc
def
ghi
Upvotes: 0