Reputation: 1467
Problem :
I stumbled across a problem that has deep (and very impactful) consequences in the industry, and does not seem to be documented anywhere :
It seems that in python, Matrix Multiplication (using either @
or np.matmul
) gives different answers between C-contiguous and F-contiguous arrays :
import platform
print(platform.platform()) # Linux-5.10.0-23-cloud-amd64-x86_64-with-glibc2.31
import numpy as np
print(np.__version__) # 1.23.5
np.random.seed(0) # Will work with most of seed, I believe
M, N = 10, 5
X_c = np.random.normal(size=M*N).reshape(M,N)
X_f = np.asfortranarray(X_c)
assert (X_c == X_f).all()
p = np.random.normal(size=N)
assert (X_c @ p == X_f @ p).all() # FAIL
To your best knowledge, is this problem documented anywhere ?
Example of Consequence :
In some cases, these errors can become huge :
Example : In sklearn.linear_model.LinearRegression
class, the fit
method will leads to very wrong parameters in the case of near-singular matrix X that is F-contiguous (typically, that came out of a pandas DataFrame).
This can leads to prediction all over the place and negative R2.
Upvotes: 5
Views: 204
Reputation: 5451
As I see it, the problem that you noticed is a consequence of the fundamental limitations of floating point arithmetic; in particular
Consider the example from the code below, in which we want to sum all elements of an array a
. We will do this in two different orders: sum_1
sums all values from left to right, sum_2
progressively sums all neighboring pairs of values until only one value remains. Mathematically, this should not make a difference, as summation is an associative operation on real numbers. It is not associative on the floating point numbers on a computer, however, as we see from the output of the last line:
import numpy as np
eps_half = np.finfo(float).eps / 2
a = np.asarray([0.5, 0.5, eps_half, eps_half], dtype=float)
sum_1 = ((a[0] + a[1]) + a[2]) + a[3]
sum_2 = (a[0] + a[1]) + (a[2] + a[3])
print(sum_1, "versus", sum_2)
# >>> 1.0 versus 1.0000000000000002
What is going on here? Notice the use of eps
, which is defined as the difference between 1.0 and the next smallest representable float larger than 1.0. For the float
type, at least on my system, the value is ca. 2e-16
, i.e. 0.0000000000000002. The definition implies that, if we add a value smaller than eps
to 1.0, the result will still be 1.0!
In the constructed example from above, the latter happens in the calculation for sum_1
but not for sum_2
, as we can see from their calculation steps:
sum_1
:
0.5 + 0.5 → 1.0
add a[0] and a[1]: works as expected1.0 + eps/2 → 1.0
add a[2] to previous sum: eps/2 is too small for the result to be represented as its own float value larger than 1.0, so the sum remains 1.01.0 + eps/2 → 1.0
add a[3] to previous sum: same problem as in step 2sum_2
:
0.5 + 0.5 → 1.0
add a[0] and a[1]: works as expectedeps/2 + eps/2 → eps
add a[2] and a[3]: works as expected1.0 + eps → 1.0000000000000002
add results from steps 1 and 2: works as expected, because eps is large enough for the result to be represented as its own float value larger than 1.0You might ask yourself what your observation has to do with my example from above. Here is what is most likely going on:
For each element in the result vector of a matrix-vector product, as is calculated in your code, essentially we calculate a dot product of two vectors: (1) the corresponding row of the given matrix and (2) the given vector. The dot product, by definition, multiplies the corresponding elements of the given vectors, then sums these intermediate results. Now the only option that may give us different results in calculating a matrix-vector product from identical input values may lie in this summation; or rather, its order, as the last step of each dot product. And here is the thing:
eps
, but a general problem of summing large and small floating point values.I try to summarize some points that should serve as key takeaways:
print(np.abs(X_c @ p - X_f @ p).max())
# >>> 2.220446049250313e-16
… which is very small (incidentally, it is the exact value of eps
from above). This does not mean that such imprecisions should be neglected, e.g. they may amplify for progressive calculations.np.allclose()
over exact equality comparison.I admit, I doubt a bit your final conclusion; namely, that the huge errors that you see in sklearn.linear_model.LinearRegression
have the same cause, but maybe it is the result of the aforementioned amplification effects. Maybe you should provide a minimum reproducible example for further investigation there.
Upvotes: 3