Reputation: 11
I am trying to learn python and deep learning recently. My teacher sent me a pkl file which contains the data I need. The size of pkl file is 9.6GB. My memory is only 16g.
When I try to load the whole file with pickle.load(open('data.pkl', 'rb'))
, my computer crashed:(
And then, I try to use buffer to load the pkl file, my computer crashed again :( below is the code of buffer:
import pickle
import gc
block_size = 512 * 1024 * 1024 # 512Mb
data = b''
count_num = 0
with open('../data.pkl', 'rb') as f:
while True:
buffer = f.read(block_size)
if not buffer:
break;
count_num += 1
data += buffer
print("read" + str(count_num*512) + "Mb")
gc.collect()
print("finish")
After that, I try to Split large files into small files, but I can't load the split small files because of UnpicklingError: pickle data was truncated
and UnpicklingError: A load persistent id instruction was encountered, but no persistent_load function was specified.
below is the code of splitting:
import pickle
import gc
block_size = 10 * 1024 * 1024
count_num = 0
with open('../data.pkl', 'rb') as f:
while True:
buffer = f.read(block_size)
if not buffer:
break;
count_num += 1
print("read" + str(count_num) + "0Mb")
fw = open("data/wiki-data-statement-"+str(count_num)+".pkl", "wb")
pickle.dump(buffer, fw)
print("split"+ str(count_num) + "block")
gc.collect()
print("finish")
I need some kind suggestions that how I can solve this problem? Any suggestions about other tools which can perform this task, will be appreciable. Thanks
Upvotes: 0
Views: 449