Reputation: 36239
In the course of tracking down a related problem I stumbled upon the fact that np.std
seems to be returning different values depending on whether the axis
keyword argument was specified or the corresponding masking was done manually. Consider the following snippet:
import numpy as np
np.random.seed(123)
a = np.empty(shape=(100, 2), dtype=float)
a[:, 0] = np.random.uniform()
a[:, 1] = np.random.uniform()
print(np.std(a, axis=0)[0] == np.std(a[:, 0])) # Should be the same.
print(np.std(a, axis=0)[1] == np.std(a[:, 1])) # Should be the same.
However the two computations don't return the same result. Further inspection reveals:
>>> print('axis=0: {:e} vs {:e}'.format(np.std(a, axis=0)[0], np.std(a[:, 0])))
axis=0: 7.771561e-16 vs 2.220446e-16
>>> print('axis=1: {:e} vs {:e}'.format(np.std(a, axis=0)[1], np.std(a[:, 1])))
axis=1: 4.440892e-16 vs 0.000000e+00
I don't see why the two ways of computation would return different results since formally they describe the same procedure (masking the axis manually or letting numpy do the job by specifying axis
shouldn't make a difference).
I am using Python 3.5.2 and numpy 1.15.0.
Upvotes: 3
Views: 137
Reputation: 7210
These numbers, as you may have noticed, are quite small
. So small, in fact, that neither is particularly accurate. Notably, minor differences in implementation will in fact result in different answers do to the inaccuracy of floating point numbers. In numpy's implementation of std, which is in C
, performs the axis computation differently than done explicitly here.
Of course, the 'real' standard deviation of this data along the column is of course 0
.
Upvotes: 1