Reputation: 2063
let say we have a list of strings which is so big that if I save it as a normal text file(every element in a separate line) it'll be 1GB in size;
currently I use this code to save the list:
savefile = codecs.open("BigList.txt", "w", "utf-8")
savefile.write("\r\n".join(BigList));
savefile.close()
as soon as we reach to this part of code: "\r\n".join(BigList)
, I can see a huge bump in memory usage and also considerable time(~1min) to save the results;
any tip or suggestion for better handling this list(less memory-usage) and save it on hard-disk more quickly?
Upvotes: 2
Views: 1341
Reputation: 23500
To save disk-space you could do:
from gzip impo GzipFile
with GzipFile('dump.txt', 'w') as fh:
fh.write('\r\n'.join(BigList))
(also use the with
operator instead).
Combine this with a for
operator in order to save memory:
from gzip impo GzipFile
with GzipFile('dump.txt', 'w') as fh:
for item in BigList:
fh.write(str(item)+'\r\n')
And to do it really quick you could potentially do (saves memory, disk-space and time):
import pickle
from gzip import GzipFile
with GzipFile('dump.pckl', 'wb') as fh:
pickle.dump(BigList, fh)
Note however that this big list of yours would only be accessible to external programs if they understand pythons pickle structure on the data. But assuming you want to re-use the BigList in your own application, pickle is the way to go.
Noticed some comment about you reading a big textfile in order to write to another file..
In that case the above an approach that would work for you.
If you want to save memory or time over two files. Consider the following instead:
with open('file_one.txt', 'rb') as inp:
with open('file_two.txt', 'wb' out:
for line in inp:
out.write(do_work(line)+b'\r\n')
Upvotes: 1
Reputation: 8017
for line in BigList:
savefile.write(line+'\n')
I would do it by iterating.
Upvotes: 1
Reputation: 98118
The join in:
"\r\n".join(BigList)
is creating a very large string in the memory before writing it down. It will be much more memory efficient if you use a for loop:
for line in BigList:
savefile.write(line + "\r\n")
another question is, why do you have so many strings in the memory in the first place?
Upvotes: 3