Nick Jones
Nick Jones

Reputation: 225

Memory Error with numpy on several large arrays

So I am trying to carry out the following calculations on a series of large arrays but I keep getting the error:

MemoryError

In total there are 9 grain_size arrays 2745 by 2654 (Note: I could use just a single float here instead of an array as it is an array of the same number in every cell and this doesn't change), 9 g_pro arrays 2745 by 2654 and the 9 arrays I create below.

So I guess my questions would be is there a way to work around this issue?

# Create empty arrays to store the information
Fs1 = np.zeros_like(g_pro_1, dtype = float)
Fs2 = np.zeros_like(g_pro_1, dtype = float)
Fs3 = np.zeros_like(g_pro_1, dtype = float)
Fs4 = np.zeros_like(g_pro_1, dtype = float)
Fs5 = np.zeros_like(g_pro_1, dtype = float)
Fs6 = np.zeros_like(g_pro_1, dtype = float)
Fs7 = np.zeros_like(g_pro_1, dtype = float)
Fs8 = np.zeros_like(g_pro_1, dtype = float)
Fs9 = np.zeros_like(g_pro_1, dtype = float)

# Check where the condition is true
np.putmask(Fs1, np.logical_and(grain_size_1_array > 0.0000625, grain_size_1_array <= 0.002), g_pro_1)
np.putmask(Fs2, np.logical_and(grain_size_2_array > 0.0000625, grain_size_2_array <= 0.002), g_pro_2)
np.putmask(Fs3, np.logical_and(grain_size_3_array > 0.0000625, grain_size_3_array <= 0.002), g_pro_3)
np.putmask(Fs4, np.logical_and(grain_size_4_array > 0.0000625, grain_size_4_array <= 0.002), g_pro_4)
np.putmask(Fs5, np.logical_and(grain_size_5_array > 0.0000625, grain_size_5_array <= 0.002), g_pro_5)
np.putmask(Fs6, np.logical_and(grain_size_6_array > 0.0000625, grain_size_6_array <= 0.002), g_pro_6)
np.putmask(Fs7, np.logical_and(grain_size_7_array > 0.0000625, grain_size_7_array <= 0.002), g_pro_7)
np.putmask(Fs8, np.logical_and(grain_size_8_array > 0.0000625, grain_size_8_array <= 0.002), g_pro_8)
np.putmask(Fs9, np.logical_and(grain_size_9_array > 0.0000625, grain_size_9_array <= 0.002), g_pro_9)

Fs = Fs1 + Fs2 + Fs3 + Fs4 + Fs5 + Fs6 + Fs7 + Fs8 + Fs9
Fs[self.discharge == -9999] = -9999

The code that worked for me now is:

Fs = np.zeros_like(g_pro_1, dtype = float)

    grain_array_list = [self.grain_size_1, self.grain_size_2, self.grain_size_3,    self.grain_size_4, self.grain_size_5, self.grain_size_6, self.grain_size_7, self.grain_size_8, self.grain_size_9]
    proportions_list = [g_pro_1, g_pro_2, g_pro_3, g_pro_4, g_pro_5, g_pro_6, g_pro_7, g_pro_8, g_pro_9]

    for proportion, grain in izip(proportions_list, grain_array_list):  
        if grain > 0.0000625 and grain <= 0.002:
            print grain
            Fs = Fs + proportion

    Fs[self.discharge == -9999] = -9999

Upvotes: 1

Views: 540

Answers (2)

mbatchkarov
mbatchkarov

Reputation: 16109

Every time you see lines of code that only differ by a single character, you should be using a loop. In your case, you are holding data that you are not using in memory. Your workflow is basically

  • get a grain_size_array
  • apply a mask to grain_size_array
  • add the mask to a placeholder (Fs)
  • dispose of the mask and grain_size_array

In terms of code, you need something like:

g_pro_1 = load() # however you get that
Fs = np.zeros_like(g_pro_1, dtype = float)
Fs_tmp = np.zeros_like(g_pro_1, dtype = float)
for i in range(10):
    g_pro = load() # whatever
    grain_size_array = load() # whatever
    np.putmask(Fs_tmp, np.logical_and(grain_size_array > 0.0000625, grain_size_array <= 0.002), g_pro_1)
    Fs += Fs_tmp

Upvotes: 1

Phillip
Phillip

Reputation: 13678

Your example requires 9*2745*2654*sizeof(float) Bytes, i.e. 500 MiB, to store the grain_size arrays and again as much to store the g_pro arrays. To run the logical_and functions, the parameter arrays with the results of the comparisons must be stored, adding another 100 Mib. Maybe you really simply run out of memory eventually?

You could either try

  • to increase the physical or swap memory available on your system, or
  • to create and process the Fs<n> arrays one after another rather than having each of them in memory at the same time

Upvotes: 1

Related Questions