Reputation: 33
I want to create a Numpy kernel matrix of dimensions 25000*25000. I want to know what is the most efficient way to handle such large matrix in terms of saving it on disk and loading it. I tried dumping it with Pickle, but it threw an error saying it cannot serialize objects of size greater than 4 Gib.
Upvotes: 0
Views: 345
Reputation: 2623
Why not try to save the array as a file instead of using pickle
np.savetxt("filename",array)
It then can be read by
np.genfromtxt("filename")
Upvotes: 1
Reputation: 31
u could try to save it in h5 file by pandas.HDFStore()
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.rand(25000,25000).astype('float16'))
memory_use = round(df.memory_usage(deep=True).sum()/1024*3,2)
print('use{}G'.format(memory_use))
store = pd.HDFStore('test.h5', 'w)
store['data'] = df
store.close()
Upvotes: 1