Iterating through a numpy array and then indexing a value in another array

Question

I am struggling to get this code to work I want to iterate through an numpy array and based on the result, index to a value in another numpy array and then save that in a new position based on that value.

     # Convert the sediment transport and the flow direction rasters into Numpy arrays
    sediment_transport_np = arcpy.RasterToNumPyArray(sediment_transport_convert, '#', '#', '#', -9999)
    flow_direction_np = arcpy.RasterToNumPyArray(flow_direction_convert, '#', '#', '#', -9999)

    [rows,cols]= sediment_transport_np.shape
    elevation_change = np.zeros((rows,cols), np.float)

   # Main body for calculating elevation change        

    # Attempt 1
    for [i, j], flow in np.ndenumerate(flow_direction_np):
        if flow == 32:
            elevation_change[i, j]  = sediment_transport_np[i - 1, j - 1]
        elif flow == 16:
            elevation_change[i, j] = sediment_transport_np[i, j - 1]
        elif flow == 8:
            elevation_change[i, j] = sediment_transport_np[i + 1, j - 1]
        elif flow == 4:
            elevation_change[i, j] = sediment_transport_np[i + 1, j]
        elif flow == 64:
            elevation_change[i, j] = sediment_transport_np[i - 1, j]
        elif flow == 128:
            elevation_change[i, j] = sediment_transport_np[i - 1, j + 1]
        elif flow == 1:
            elevation_change[i, j] = sediment_transport_np[i, j + 1]
        elif flow == 2:
            elevation_change[i, j] = sediment_transport_np[i + 1, j + 1]

Convert the Numpy arrays back to rasters

    elevation_change_raster = arcpy.NumPyArrayToRaster(elevation_change, bottom_left_corner, raster_cell_width, raster_cell_height, -9999)

    elevation_change_raster.save(output_raster)

The error I get is:

Running script elevation_change...

Traceback (most recent call last): File "", line 606, in execute IndexError: index (655) out of range (0<=index<655) in dimension 0

Failed to execute (elevation_change)

Joe Kington · Accepted Answer

Cause of the Problem

The error is because you're trying to index beyond the bounds of the sediment_transport grid (e.g. the i+1 and j+1 portions). Right now, you're trying to get a value that doesn't exist when you're at a boundary of the grid. Also, it's not raising an error, but you're currently grabbing the opposite edge when you're at i=0 or j=0 (due to the i-1 and j-1 parts).

You mentioned that you wanted the values of elevation_change to be 0 at the boundaries (which certainly seems reasonable). Another common boundary condition is to "wrap" the values and grab a value from the opposite edge. It probably doesn't make much sense in this case, but I'll show it in a couple of examples because it's easy to implement with some of the methods.

Tempting but Incorrect

It's tempting to just catch the exception and set the value to 0. For example:

for [i, j], flow in np.ndenumerate(flow_direction_np):
    try:
        if flow == 32:
            ...
        elif ...
            ...
    except IndexError:
        elevation_change[i, j] = 0

However, this approach is actually incorrect. Negative indexing is valid, and will return the opposite edge of the grid. Therefore, this would basically implement a "zero" boundary condition on the right and bottom edges of the grid, and a "wrap-around" boundary condition on the left and top edges.

Padding with zeros

In the case of "zero" boundary conditions, there's a very simple way to avoid indexing problems: Pad the sediment_transport grid with zeros. This way, if we index beyond the edge of the original grid, we'll get a 0. (Or whatever constant value you'd like to pad the array with.)

Side note: This is the perfect place to use numpy.pad. However, it was added in v1.7. I'm going to skip using it here, as the OP mentions ArcGIS, and Arc doesn't ship with an up-to-date version of numpy.

For example:

padded_transport = np.zeros((rows + 2, cols + 2), float)
padded_transport[1:-1, 1:-1] = sediment_transport  
# The two lines above could be replaced with:
#padded_transport = np.pad(sediment_transport, 1, mode='constant')

for [i, j], flow in np.ndenumerate(flow_direction):
    # Need to take into account the offset in the "padded_transport"
    r, c = i + 1, j + 1

    if flow == 32:
        elevation_change[i, j] = padded_transport[r - 1, c - 1]
    elif flow == 16:
        elevation_change[i, j] = padded_transport[r, c - 1]
    elif flow == 8:
        elevation_change[i, j] = padded_transport[r + 1, c - 1]
    elif flow == 4:
        elevation_change[i, j] = padded_transport[r + 1, c]
    elif flow == 64:
        elevation_change[i, j] = padded_transport[r - 1, c]
    elif flow == 128:
        elevation_change[i, j] = padded_transport[r - 1, c + 1]
    elif flow == 1:
        elevation_change[i, j] = padded_transport[r, c + 1]
    elif flow == 2:
        elevation_change[i, j] = padded_transport[r + 1, c + 1]

DRY (Don't Repeat Yourself)

We can write this code a bit more compactly by using a dict:

elevation_change = np.zeros_like(sediment_transport)
nrows, ncols = flow_direction.shape
lookup = {32: (-1, -1),
          16:  (0, -1), 
          8:   (1, -1),
          4:   (1,  0),
          64: (-1,  0),
          128:(-1,  1),
          1:   (0,  1),
          2:   (1,  1)}

padded_transport = np.zeros((nrows + 2, ncols + 2), float)
padded_transport[1:-1, 1:-1] = sediment_transport  

for [i, j], flow in np.ndenumerate(flow_direction):
    # Need to take into account the offset in the "padded_transport"
    r, c = i + 1, j + 1
    # This also allows for flow_direction values not listed above...
    dr, dc = lookup.get(flow, (0,0))
    elevation_change[i,j] = padded_transport[r + dr, c + dc]

At this point, it's a bit superfluous to pad the original array. Implement different boundary conditions by padding is very easy if you use numpy.pad, but we could just write the logic out directly:

elevation_change = np.zeros_like(sediment_transport)
nrows, ncols = flow_direction.shape
lookup = {32: (-1, -1),
          16:  (0, -1), 
          8:   (1, -1),
          4:   (1,  0),
          64: (-1,  0),
          128:(-1,  1),
          1:   (0,  1),
          2:   (1,  1)}

for [i, j], flow in np.ndenumerate(flow_direction):
    dr, dc = lookup.get(flow, (0,0))
    r, c = i + dr, j + dc
    if not ((0 <= r < nrows) & (0 <= c < ncols)):
        elevation_change[i,j] = 0
    else:
        elevation_change[i,j] = sediment_transport[r, c]

"Vectorizing" the Calculation

Iterating through numpy arrays in python is rather slow for reasons I won't delve into here. Therefore, there are more efficient ways to implement this in numpy. The trick is to use numpy.roll along with boolean indexing.

For "wrap-around" boundary conditions, it's as simple as:

elevation_change = np.zeros_like(sediment_transport)
nrows, ncols = flow_direction.shape
lookup = {32: (-1, -1),
          16:  (0, -1), 
          8:   (1, -1),
          4:   (1,  0),
          64: (-1,  0),
          128:(-1,  1),
          1:   (0,  1),
          2:   (1,  1)}

for value, (row, col) in lookup.iteritems():
    mask = flow_direction == value
    shifted = np.roll(mask, row, 0)
    shifted = np.roll(shifted, col, 1)
    elevation_change[mask] = sediment_transport[shifted]

return elevation_change

If you're not familiar with numpy, this probably looks a bit like greek. There are two parts to this. The first is using boolean indexing. As a quick example of what this does:

In [1]: import numpy as np

In [2]: x = np.arange(9).reshape(3,3)

In [3]: x
Out[3]: 
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [4]: mask = np.array([[False, False, True],
...                      [True, False, False],
...                      [True, False, False]])


In [5]: x[mask]
Out[5]: array([2, 3, 6])

As you can see, if we index an array with a boolean grid of the same shape, the values where it is True, will be returned. Similarly, you can set values this way.

The next trick is numpy.roll. This will shift the values by a given amount in one direction. They'll "wrap-around" at the edges.

In [1]: import numpy as np

In [2]: np.array([[0,0,0],[0,1,0],[0,0,0]])
Out[2]: 
array([[0, 0, 0],
       [0, 1, 0],
       [0, 0, 0]])

In [3]: x = _

In [4]: np.roll(x, 1, axis=0)
Out[4]: 
array([[0, 0, 0],
       [0, 0, 0],
       [0, 1, 0]])

In [5]: np.roll(x, 1, axis=1)
Out[5]: 
array([[0, 0, 0],
       [0, 0, 1],
       [0, 0, 0]])

Hopefully that makes a bit of sense, at any rate.

To implement "zero" boundary conditions (or arbitrary boundary conditions using numpy.pad), we'd do something like this:

def vectorized(flow_direction, sediment_transport):
    elevation_change = np.zeros_like(sediment_transport)
    nrows, ncols = flow_direction.shape
    lookup = {32: (-1, -1),
              16:  (0, -1), 
              8:   (1, -1),
              4:   (1,  0),
              64: (-1,  0),
              128:(-1,  1),
              1:   (0,  1),
              2:   (1,  1)}

    # Initialize an array for the "shifted" mask
    shifted = np.zeros((nrows+2, ncols+2), dtype=bool)

    # Pad "sediment_transport" with zeros
    # Again, `np.pad` would be better and more flexible here, as it would
    # easily allow lots of different boundary conditions...
    tmp = np.zeros((nrows+2, ncols+2), sediment_transport.dtype)
    tmp[1:-1, 1:-1] = sediment_transport
    sediment_transport = tmp

    for value, (row, col) in lookup.iteritems():
        mask = flow_direction == value

        # Reset the "shifted" mask
        shifted.fill(False)
        shifted[1:-1, 1:-1] = mask

        # Shift the mask by the right amount for the given value
        shifted = np.roll(shifted, row, 0)
        shifted = np.roll(shifted, col, 1)

        # Set the values in elevation change to the offset value in sed_trans
        elevation_change[mask] = sediment_transport[shifted]

    return elevation_change

Advantage to Vectorization

The "vectorized" version is much faster, but will use more RAM.

For a 1000 by 1000 grid:

In [79]: %timeit vectorized(flow_direction, sediment_transport)
10 loops, best of 3: 170 ms per loop

In [80]: %timeit iterate(flow_direction, sediment_transport)
1 loops, best of 3: 5.36 s per loop

In [81]: %timeit lookup(flow_direction, sediment_transport)
1 loops, best of 3: 3.4 s per loop

These results are from comparing the following implementations with randomly generated data:

import numpy as np

def main():
    # Generate some random flow_direction and sediment_transport data...
    nrows, ncols = 1000, 1000
    flow_direction = 2 ** np.random.randint(0, 8, (nrows, ncols))
    sediment_transport = np.random.random((nrows, ncols))

    # Make sure all of the results return the same thing...
    test1 = vectorized(flow_direction, sediment_transport)
    test2 = iterate(flow_direction, sediment_transport)
    test3 = lookup(flow_direction, sediment_transport)
    assert np.allclose(test1, test2)
    assert np.allclose(test2, test3)


def vectorized(flow_direction, sediment_transport):
    elevation_change = np.zeros_like(sediment_transport)
    sediment_transport = np.pad(sediment_transport, 1, mode='constant')
    lookup = {32: (-1, -1),
              16:  (0, -1), 
              8:   (1, -1),
              4:   (1,  0),
              64: (-1,  0),
              128:(-1,  1),
              1:   (0,  1),
              2:   (1,  1)}

    for value, (row, col) in lookup.iteritems():
        mask = flow_direction == value
        shifted = np.pad(mask, 1, mode='constant')
        shifted = np.roll(shifted, row, 0)
        shifted = np.roll(shifted, col, 1)
        elevation_change[mask] = sediment_transport[shifted]

    return elevation_change

def iterate(flow_direction, sediment_transport):
    elevation_change = np.zeros_like(sediment_transport)
    padded_transport = np.pad(sediment_transport, 1, mode='constant')  

    for [i, j], flow in np.ndenumerate(flow_direction):
        r, c = i + 1, j + 1
        if flow == 32:
            elevation_change[i, j] = padded_transport[r - 1, c - 1]
        elif flow == 16:
            elevation_change[i, j] = padded_transport[r, c - 1]
        elif flow == 8:
            elevation_change[i, j] = padded_transport[r + 1, c - 1]
        elif flow == 4:
            elevation_change[i, j] = padded_transport[r + 1, c]
        elif flow == 64:
            elevation_change[i, j] = padded_transport[r - 1, c]
        elif flow == 128:
            elevation_change[i, j] = padded_transport[r - 1, c + 1]
        elif flow == 1:
            elevation_change[i, j] = padded_transport[r, c + 1]
        elif flow == 2:
            elevation_change[i, j] = padded_transport[r + 1, c + 1]
    return elevation_change

def lookup(flow_direction, sediment_transport):
    elevation_change = np.zeros_like(sediment_transport)
    nrows, ncols = flow_direction.shape
    lookup = {32: (-1, -1),
              16:  (0, -1), 
              8:   (1, -1),
              4:   (1,  0),
              64: (-1,  0),
              128:(-1,  1),
              1:   (0,  1),
              2:   (1,  1)}

    for [i, j], flow in np.ndenumerate(flow_direction):
        dr, dc = lookup.get(flow, (0,0))
        r, c = i + dr, j + dc
        if not ((0 <= r < nrows) & (0 <= c < ncols)):
            elevation_change[i,j] = 0
        else:
            elevation_change[i,j] = sediment_transport[r, c]

    return elevation_change

if __name__ == '__main__':
    main()