tsb
tsb

Reputation: 53

Needing to assess smaller 3D arrays in larger 3D array with Numpy

I have to take a random integer 50x50x50 array and determine which contiguous 3x3x3 cube within it has the largest sum.

It seems like a lot of splitting features in Numpy don't work well unless the smaller cubes are evenly divisible into the larger one. Trying to work through the thought process I made a 48x48x48 cube that is just in order from 1 to 110,592. I then was thinking of reshaping it to a 4D array with the following code and assessing which of the arrays had the largest sum? when I enter this code though it splits the array in an order that is not ideal. I want the first array to be the 3x3x3 cube that would have been in the corner of the 48x48x48 cube. Is there a syntax that I can add to make this happen?

import numpy as np

arr1 = np.arange(0,110592)
arr2=np.reshape(arr1, (48,48,48))
arr3 = np.reshape(arr2, (4096, 3,3,3))
arr3

output:

array([[[[     0,      1,      2],
         [     3,      4,      5],
         [     6,      7,      8]],

        [[     9,     10,     11],
         [    12,     13,     14],
         [    15,     16,     17]],

        [[    18,     19,     20],
         [    21,     22,     23],
         [    24,     25,     26]]],

desired output:

array([[[[     0,      1,      2],
         [    48,      49,      50],
         [     96,      97,      98]],

etc etc

Upvotes: 2

Views: 1062

Answers (4)

tel
tel

Reputation: 13999

Solution

There's a live version of this solution online you can try for yourself

There's a simple (kind of) solution to your original problem of finding the maximum 3x3x3 subcube in a 50x50x50 cube that's based on changing the input array's strides. This solution is completely vectorized (meaning no looping), and so should get the best possible performance out of Numpy:

import numpy as np

def cubecube(arr, cshape):
    strides = (*arr.strides, *arr.strides)
    shape = (*np.array(arr.shape) - cshape + 1, *cshape)
    return np.lib.stride_tricks.as_strided(arr, shape=shape, strides=strides)

def maxcube(arr, cshape):
    cc = cubecube(arr, cshape)
    ccsums = cc.sum(axis=tuple(range(-arr.ndim, 0)))
    ix = np.unravel_index(np.argmax(ccsums), ccsums.shape)[:arr.ndim]
    return ix, cc[ix]

The maxcube function takes an array and the shape of the subcubes, and returns a tuple of (first-index-of-largest-cube, largest-cube). Here's an example of how to use maxcube:

shape = (50, 50, 50)
cshape = (3, 3, 3)

# set up a 50x50x50 array
arr = np.arange(np.prod(shape)).reshape(*shape)

# set one of the subcubes as the largest
arr[37, 26, 11] = 999999

ix, cube = maxcube(arr, cshape)
print('first index of largest cube: {}'.format(ix))
print('largest cube:\n{}'.format(cube))

which outputs:

first index of largest cube: (37, 26, 11)
largest cube:
[[[999999  93812  93813]
  [ 93861  93862  93863]
  [ 93911  93912  93913]]

 [[ 96311  96312  96313]
  [ 96361  96362  96363]
  [ 96411  96412  96413]]

 [[ 98811  98812  98813]
  [ 98861  98862  98863]
  [ 98911  98912  98913]]]

In depth explanation

A cube of cubes

What you have is a 48x48x48 cube, but what you want is a cube of smaller cubes. One can be converted to the other by altering its strides. For a 48x48x48 array of dtype int64, the stride will originally be set as (48*48*8, 48*8, 8). The first value of each non-overlapping 3x3x3 subcube can be iterated over with a stride of (3*48*48*8, 3*48*8, 3*8). Combine these strides to get the strides of the cube of cubes:

# Set up a 48x48x48 array, like in OP's example
arr = np.arange(48**3).reshape(48,48,48)

shape = (16,16,16,3,3,3)
strides = (3*48*48*8, 3*48*8, 3*8, 48*48*8, 48*8, 8)

# restride into a 16x16x16 array of 3x3x3 cubes
arr2 = np.lib.stride_tricks.as_strided(arr, shape=shape, strides=strides)

arr2 is a view of arr (meaning that they share data, so no copy needs to be made) with a shape of (16,16,16,3,3,3). The ijkth 3x3 cube in arr can be accessed by passing the indices to arr2:

i,j,k = 0,0,0
print(arr2[i,j,k])

Output:

[[[   0    1    2]
  [  48   49   50]
  [  96   97   98]]

 [[2304 2305 2306]
  [2352 2353 2354]
  [2400 2401 2402]]

 [[4608 4609 4610]
  [4656 4657 4658]
  [4704 4705 4706]]]

You can get the sums of all of the subcubes by just summing across the inner axes:

sumOfSubcubes = arr2.sum(3,4,5)

This will yield a 16x16x16 array in which each value is the sum of a non-overlapping 3x3x3 subcube from your original array. This solves the specific problem about the 48x48x48 array that the OP asked about. Restriding can also be used to find all of the overlapping 3x3x3 cubes, as in the cubecube function above.

Upvotes: 1

Paul Panzer
Paul Panzer

Reputation: 53029

Here is a cumsum based fast solution:

import numpy as np

nd = 3
cs = 3
N = 50

# create indices [cs-1:, ...], [:, cs-1:, ...], ...
fromcsm = *zip(*np.where(np.identity(nd, bool), np.s_[cs-1:], np.s_[:])),
# create indices [cs:, ...], [:, cs:, ...], ...
fromcs = *zip(*np.where(np.identity(nd, bool), np.s_[cs:], np.s_[:])),
# create indices [:cs, ...], [:, :cs, ...], ...
tocs = *zip(*np.where(np.identity(nd, bool), np.s_[:cs], np.s_[:])),
# create indices [:-cs, ...], [:, :-cs, ...], ...
tomcs = *zip(*np.where(np.identity(nd, bool), np.s_[:-cs], np.s_[:])),
# create indices [cs-1, ...], [:, cs-1, ...], ...
atcsm = *zip(*np.where(np.identity(nd, bool), cs-1, np.s_[:])),

def windowed_sum(a):
    out = a.copy()
    for i, (fcsm, fcs, tcs, tmcs, acsm) \
            in enumerate(zip(fromcsm, fromcs, tocs, tomcs, atcsm)):
        out[fcs] -= out[tmcs]
        out[acsm] = out[tcs].sum(axis=i)
        out = out[fcsm].cumsum(axis=i)
    return out

This returns the sums over all the sub cubes. We can then use argmax and unravel_index to get the offset of the maximum cube. Example:

np.random.seed(0)
a = np.random.randint(0,9,(N,N,N))
s = windowed_sum(a)
idx = np.unravel_index(np.argmax(s,), s.shape)

Upvotes: 0

b-fg
b-fg

Reputation: 4137

This is a solution without many numpy functions, just numpy.sum. First define a squared matrix and then the size of the cube cs you are going to perform the summation within. Just change cs to adjust the cube size and find other solutions. Following @Divakar suggestion, I have used a 4x4x4 array and I also store the location where the cube is location (just the vertex of the cube's origin)

import numpy as np
np.random.seed(0)
a = np.random.randint(0,9,(4,4,4))
print(a)
cs = 2 # Cube size
my_sum = 0
idx = None
for i in range(a.shape[0]-cs+2):
  for j in range(a.shape[1]-cs+2):
    for k in range(a.shape[2]-cs+2):
      cube_sum = np.sum(a[i:i+cs, j:j+cs, k:k+cs])
      print(cube_sum)
      if cube_sum > my_sum:
        my_sum = cube_sum
        idx = (i,j,k)
print(my_sum, idx) # 42 (0, 0, 0)

This 3D array a is

[[[5 0 3 3]
  [7 3 5 2]
  [4 7 6 8]
  [8 1 6 7]]

 [[7 8 1 5]
  [8 4 3 0]
  [3 5 0 2]
  [3 8 1 3]]

 [[3 3 7 0]
  [1 0 4 7]
  [3 2 7 2]
  [0 0 4 5]]

 [[5 6 8 4]
  [1 4 8 1]
  [1 7 3 6]
  [7 2 0 3]]]

And you get my_sum = 42 and idx = (0, 0, 0) for cs = 2. And my_sum = 112 and idx = (1, 0, 0) for cs = 3

Upvotes: 0

Halbeard
Halbeard

Reputation: 121

Your thought process with the 48x48x48 cube goes in the right direction insofar that there are 48³ different contiguous 3x3x3 cubes within the 50x50x50 array, though I don't understand why you would want to reshape it.

What you could do is add all 27 values of each 3x3x3 cube to a 48x48x48 dimensional array by going through all 27 permutations of adjacent slices and find the maximum over it. The found entry will give you the index tuple coordinate_index of the cube corner that is closest to the origin of your original array.

import numpy as np
np.random.seed(0)
array_shape = np.array((50,50,50), dtype=int)
cube_dim = np.array((3,3,3), dtype=int)
original_array = np.random.randint(array_shape)
reduced_shape = array_shape - cube_dim + 1

sum_array = np.zeros(reduced shape, dtype=int)
for i in range(cube_dim[0]):
  for j in range(cube_dim[1]):
    for k in range(cube_dim[2]):
      sum_array += original_array[
          i:-cube_dim[0]+1+i, j:-cube_dim[1]+1+j, k:-cube_dim[2]+1+k
      ]
flat_index = np.argmax(sum_array)
coordinate_index = np.unravel_index(flat_index, reduced_shape)

This method should be faster than looping over each of the 48³ index combinations to find the desired cube as it uses in place summation but in turn requires more memory. I'm not sure about it, but defining an (48³, 27) array with slices and using np.sum over the second axis could be even faster.

You can easily change the above code to find a cuboid with arbitrary side lengths instead.

Upvotes: 0

Related Questions