coding rookie
coding rookie

Reputation: 11

pkl file is too large to load

I am trying to learn python and deep learning recently. My teacher sent me a pkl file which contains the data I need. The size of pkl file is 9.6GB. My memory is only 16g. When I try to load the whole file with pickle.load(open('data.pkl', 'rb')), my computer crashed:(

And then, I try to use buffer to load the pkl file, my computer crashed again :( below is the code of buffer:

import pickle
import gc
block_size = 512 * 1024 * 1024 # 512Mb
data = b''
count_num = 0
with open('../data.pkl', 'rb') as f:
    while True:
        buffer = f.read(block_size)
        if not buffer:
            break;
        count_num += 1
        data += buffer
        print("read" + str(count_num*512) + "Mb")
        gc.collect()
print("finish")

After that, I try to Split large files into small files, but I can't load the split small files because of UnpicklingError: pickle data was truncated and UnpicklingError: A load persistent id instruction was encountered, but no persistent_load function was specified. below is the code of splitting:

import pickle
import gc
block_size = 10 * 1024 * 1024
count_num = 0
with open('../data.pkl', 'rb') as f:
    while True:
        buffer = f.read(block_size)
        if not buffer:
            break;
        count_num += 1
        print("read" + str(count_num) + "0Mb")
        fw = open("data/wiki-data-statement-"+str(count_num)+".pkl", "wb")
        pickle.dump(buffer, fw)
        print("split"+ str(count_num) + "block")
        gc.collect()
print("finish")

I need some kind suggestions that how I can solve this problem? Any suggestions about other tools which can perform this task, will be appreciable. Thanks

Upvotes: 0

Views: 449

Answers (0)

Related Questions