Why do these two numpy.divide operations give such different results?

Question

I would like to correct the values in hyperspectral readings from a cameara using the formula described over here;

the captured data is subtracted by dark reference and divided with white reference subtracted dark reference.

In the original example, the task is rather simple, white and dark reference has the same shape as the main data so the formula is executed as:

corrected_nparr = np.divide(np.subtract(data_nparr, dark_nparr),
np.subtract(white_nparr, dark_nparr))

However the main data is much larger in my experience. Shapes in my case are as following;

$ white_nparr.shape, dark_nparr.shape, data_nparr.shape
((100, 640, 224), (100, 640, 224), (4300, 640, 224))

that's why I repeat the reference arrays.

   white_nparr_rep = white_nparr.repeat(43, axis=0)
   dark_nparr_rep = dark_nparr.repeat(43, axis=0)
return np.divide(np.subtract(data_nparr, dark_nparr_rep), np.subtract(white_nparr_rep, dark_nparr_rep))

And it works almost perfectly, as can be seen in the image at the left. But this approach requires enormous amount of memory, so I decided to traverse the large array and replace the original values with corrected ones on-the-go instead:

ref_scale = dark_nparr.shape[0]
data_scale = data_nparr.shape[0]

for i in range(int(data_scale / ref_scale)):
    data_nparr[i*ref_scale:(i+1)*ref_scale] = 
        np.divide
        ( 
        np.subtract(data_nparr[i*ref_scale:(i+1)*ref_scale], dark_nparr),
        np.subtract(white_nparr, dark_nparr)
        )

But that traversal approach gives me the ugliest of results, as can be seen in the right. I'd appreciate any idea that would help me fix this.

Note: I apply 20-times co-adding (mean of 20 readings) to obtain the images below.

EDIT: dtype of each array is as following:

$ white_nparr.dtype, dark_nparr.dtype, data_nparr.dtype
(dtype('float32'), dtype('float32'), dtype('float32'))

Warren Weckesser · Accepted Answer

Your two methods don't agree because in the first method you used

   white_nparr_rep = white_nparr.repeat(43, axis=0)

but the second method corresponds to using

   white_nparr_rep = np.tile(white_nparr, (43, 1, 1))

If the first method is correct, you'll have to adjust the second method to act accordingly. Perhaps

for i in range(int(data_scale / ref_scale)):
    data_nparr[i*ref_scale:(i+1)*ref_scale] = 
        np.divide
        ( 
        np.subtract(data_nparr[i*ref_scale:(i+1)*ref_scale], dark_nparr[i]),
        np.subtract(white_nparr[i], dark_nparr[i])
        )

A simple example with 2-d arrays that shows the difference between repeat and tile:

In [146]: z
Out[146]: 
array([[ 1,  2,  3,  4,  5],
       [11, 12, 13, 14, 15]])

In [147]: np.repeat(z, 3, axis=0)
Out[147]: 
array([[ 1,  2,  3,  4,  5],
       [ 1,  2,  3,  4,  5],
       [ 1,  2,  3,  4,  5],
       [11, 12, 13, 14, 15],
       [11, 12, 13, 14, 15],
       [11, 12, 13, 14, 15]])

In [148]: np.tile(z, (3, 1))
Out[148]: 
array([[ 1,  2,  3,  4,  5],
       [11, 12, 13, 14, 15],
       [ 1,  2,  3,  4,  5],
       [11, 12, 13, 14, 15],
       [ 1,  2,  3,  4,  5],
       [11, 12, 13, 14, 15]])

Off topic postscript: I don't know why the author of the page that you linked to writes NumPy expressions as (for example):

corrected_nparr = np.divide(
    np.subtract(data_nparr, dark_nparr),
    np.subtract(white_nparr, dark_nparr))

NumPy allows you to write that as

corrected_nparr = (data_nparr - dark_nparr) / (white_nparr - dark_nparr)

whick looks much nicer to me.

Why do these two numpy.divide operations give such different results?

Answers (1)

Related Questions