Tradeoffs between indexing numpy array and opening file in rasterio

Question

When using rasterio I can do either of the following ways to get a single band of a raster:

import rasterio
import numpy as np

dataset = rasterio.open('filepath')

# note that if you have the full dataset read in with image = dataset.read() you can do:
image = dataset.read()
print(image.shape)
red_band = image[2, :, :] # this 
print(red_band.shape)

# which is equal to simply doing
red_band_read = dataset.read(3)
print(red_band_read.shape)

if np.array_equal(red_band_read, red_band):
    print('They are the same.')

And it will print out:

(8, 250, 250)
(250, 250)
(250, 250)
They are the same.

But I'm curious which is 'better'? I assume indexing into a numpy array is way faster than reading from a file but having some of these large satellite images open is prohibitively memory intensive. Are there any good reasons to do one or the other?

Charles Parr · Accepted Answer

You might try timing each method and see if there is a difference!

If all you need is the data from the red band, I would certainly use the latter method rather than reading all bands to memory and the then slicing off the red band from the larger array.

In a similar vein, if you already know the subset of the data you want to look at, you can use rasterio windowed reading and writing to further reduce memory consumption:

Tradeoffs between indexing numpy array and opening file in rasterio

Answers (1)

Related Questions