Reputation: 1227
Im trying to produce a usual matrix multiplication between two huge matrices (10*25,000,000). My memory runs out when I do so. How could I use numpy's memmap to be able to handle this? Is this even a good idea? I'm not so worried about the speed of the operation, I just want the result even if it means waiting some time. Thank you in advanced!
8 gbs ram, I7-2617M 1.5 1.5 ghz, Windows7 64 bits. Im using the 64 bit version of everything: python(2.7), numpy, scipy.
Edit1:
Maybe h5py is a better option?
Upvotes: 3
Views: 1357
Reputation: 1002
Try numpy.memmap and numexpr! This will work using Your disk and CPU chache without memory xD. Its nice like fortran loop. Some code in here: python - way to do fast matrix multiplication and reduction while working in memmaps and CPU. But beware of size of files that it will create - if they will be only temp files, remove them later, if not then i suppose its best to combine them with pandas.hdf5 files with compression 9x. So You create data.tofile load it with memmap, calculate, save memmap to pandas.hd5f, delete memmap. Storing data in one row is also some option with hdf5 files that should take less space - I think I read about it somewhere. Also, when You memmap 1row data with numpy just give some shape with proper order, and numpy memmap will read that 1row data in chosen shape.
Upvotes: 1
Reputation: 8773
you might try to use np.memmap
, and compute the 10x10 output matrix one element at a time.
so you just load the first row of the first matrix and the first column of the second, and then np.sum(row1 * col1)
.
Upvotes: 2