bfb
bfb

Reputation: 447

numpy mean of multidimensional array

I have a multidimensional numpy array that happens to be an array of images. Why does computing the image channel mean produce different results when using the axis argument to np.mean?

>>> X = np.array(np.random.random((9999, 128, 128, 4)) * 1e5, dtype='float32')
>>> X.shape
(9999, 128, 128, 4)
>>> mean_by_axis = np.mean(X, axis=(0, 1, 2))
array([ 13423.11523438,  13423.11523438,  13423.11523438,  13423.11523438], dtype=float32)
>>> mean = np.mean(X[:, :, :, 0])
50001.297

I expect mean_by_axis[0] == mean. Why is this not the case? The same is true of the remaining axis-3 indices 1, 2 and 3. Am I misunderstanding how to use the axis argument in np.mean?

Using numpy version '1.12.1'

Is it possible that I am overflowing the float32 accumulator? For instance:

>>> X = np.random.random(size=(100, 128, 128, 4))
>>> np.mean(X, axis=(0, 1, 2))
array([ 0.49978557,  0.49985835,  0.50000321,  0.50015689])]
>>> np.mean(X[:, :, :, 0])
0.49978556940636332

This looks about correct. If this is the case, why does the slice method not also overflow the accumulator and give the same result? Perhaps the slice methods uses a float64 accumulator and the axis method uses a float32 accumulator?

Upvotes: 2

Views: 8046

Answers (2)

Karen Fisher
Karen Fisher

Reputation: 56

Cutting to the chase. ;) It looks like the simple answer to be:

mean = np.mean(X, axis=(0, 1, 2, 3))

And given that your array (when I tried it) is in the range of 0.0001488 to 99999.99959, and we can assume is normally distributed (given the large numbers), about 50000 is reasonable for the mean.

Upvotes: 1

Jonas Adler
Jonas Adler

Reputation: 10759

I am unable to exactly reproduce your result since you do not provide your data, but with random data I can reproduce the issue:

>>> import numpy as np
>>> X = np.random.rand(9999, 128, 128, 4).astype('float32')
>>> X.shape
>>> np.mean(X, axis=(0, 1, 2))
array([ 0.10241024,  0.10241024,  0.10241024,  0.10241024], dtype=float32)
>>> np.mean(X[:, :, :, 0])
0.50000387
>>> np.mean(X[:, :, :, 0].flatten())
0.50000387

This is likely a case of insufficient numerical precision. You are summing (9999*128*128 = 163823616) floating point values, and the relative precision of a float32 is ~10^-7 so you are skimming the boundaries of the precision.

I would recommend you try casting your array to float64, which has higher precision, before calling mean and see what happens.

>>> np.mean(X.astype('float64'), axis=(0, 1, 2))
array([ 0.50000323,  0.50004907,  0.50003198,  0.49999848])
>>> np.mean(X[:, :, :, 0].astype('float64'))
0.50000323305421812
>>> np.mean(X[:, :, :, 0].flatten().astype('float64'))
0.50000323305421812

Upvotes: 1

Related Questions