Why nested dictionary slows down numpy save?

Question

Suppose we have the following code

import numpy as np

D = []
for _ in range(200):
   d = []
   for _ in range(300):
      d.append({'a': np.random.randn(64, 64, 3), 'b': np.random.randn(64, 64, 3)})
   D.append(d)

np.save('data', D)

It takes really long time to save this data. Is there something wrong with the code, or is it because of dictionary object ?

-----------------------Update----------------------------

By taking the dictionary outside, even though with same data size, it is dramatically faster. So it seems it is the dictionary which slows down the process. Is there some potential reason for that ?

i.e.

import numpy as np

D1 = []
D2 = []
for _ in range(200):
   d1 = []
   d2 = []
   for _ in range(300):
      d1.append(np.random.randn(64, 64, 3))
      d2.append(np.random.randn(64, 64, 3))
   D1.append(d1)
   D2.append(d2)

np.save('d1', D1)
np.save('d2', D2)

John Zwinck · Accepted Answer

Here is code which does something similar but in an efficient, vectorized way, without slow for loops:

np.savez('data',
    a=np.random.randn(200, 300, 64, 64, 3),
    b=np.random.randn(200, 300, 64, 64, 3))

The output format is a little different--it's more compact and will be more efficient to read back.

Note that is almost 12 GB of data, so of course it will take a while to generate the random numbers and write them to disk. If your real data is lower entropy than random numbers, you may consider using savez_compressed() to enable compression and save some disk space (at the cost of CPU time when saving and loading).

Why nested dictionary slows down numpy save?

Answers (1)

Related Questions