Reputation: 447
I have a multidimensional numpy array that happens to be an array of images. Why does computing the image channel mean produce different results when using the axis argument to np.mean?
>>> X = np.array(np.random.random((9999, 128, 128, 4)) * 1e5, dtype='float32')
>>> X.shape
(9999, 128, 128, 4)
>>> mean_by_axis = np.mean(X, axis=(0, 1, 2))
array([ 13423.11523438, 13423.11523438, 13423.11523438, 13423.11523438], dtype=float32)
>>> mean = np.mean(X[:, :, :, 0])
50001.297
I expect mean_by_axis[0] == mean. Why is this not the case? The same is true of the remaining axis-3 indices 1, 2 and 3. Am I misunderstanding how to use the axis argument in np.mean?
Using numpy version '1.12.1'
Is it possible that I am overflowing the float32 accumulator? For instance:
>>> X = np.random.random(size=(100, 128, 128, 4))
>>> np.mean(X, axis=(0, 1, 2))
array([ 0.49978557, 0.49985835, 0.50000321, 0.50015689])]
>>> np.mean(X[:, :, :, 0])
0.49978556940636332
This looks about correct. If this is the case, why does the slice method not also overflow the accumulator and give the same result? Perhaps the slice methods uses a float64 accumulator and the axis method uses a float32 accumulator?
Upvotes: 2
Views: 8046
Reputation: 56
Cutting to the chase. ;) It looks like the simple answer to be:
mean = np.mean(X, axis=(0, 1, 2, 3))
And given that your array (when I tried it) is in the range of 0.0001488 to 99999.99959, and we can assume is normally distributed (given the large numbers), about 50000 is reasonable for the mean.
Upvotes: 1
Reputation: 10759
I am unable to exactly reproduce your result since you do not provide your data, but with random data I can reproduce the issue:
>>> import numpy as np
>>> X = np.random.rand(9999, 128, 128, 4).astype('float32')
>>> X.shape
>>> np.mean(X, axis=(0, 1, 2))
array([ 0.10241024, 0.10241024, 0.10241024, 0.10241024], dtype=float32)
>>> np.mean(X[:, :, :, 0])
0.50000387
>>> np.mean(X[:, :, :, 0].flatten())
0.50000387
This is likely a case of insufficient numerical precision. You are summing (9999*128*128 = 163823616) floating point values, and the relative precision of a float32 is ~10^-7 so you are skimming the boundaries of the precision.
I would recommend you try casting your array to float64, which has higher precision, before calling mean and see what happens.
>>> np.mean(X.astype('float64'), axis=(0, 1, 2))
array([ 0.50000323, 0.50004907, 0.50003198, 0.49999848])
>>> np.mean(X[:, :, :, 0].astype('float64'))
0.50000323305421812
>>> np.mean(X[:, :, :, 0].flatten().astype('float64'))
0.50000323305421812
Upvotes: 1