Reputation: 343
I have a Java application that uses high-dimensional vectors formed from double's. It normalizes these vectors by multiplying the vector components by the reciprocal of the Euclidean norm. Sometimes, the resulting vector has a norm that is not equal to 1 to machine-precision. That this occurs does not surprise me.
My question is: how do I normalize the vector such that the resulting vector has unit length to machine precision?
These are the methods for my Vector class to compute the norm and normalize the vector:
public double getFrobeniusNorm() {
return Math.sqrt(getFrobeniusNormSquared());
}
public double getFrobeniusNormSquared() {
double normSquared = 0.0;
int numberOfRows = getRowDimension();
int numberOfColumns = getColumnDimension();
for(int i = 0; i < numberOfRows; ++i) {
for(int j = 0; j < numberOfColumns; ++j) {
double matrixElement = get(i,j);
normSquared += matrixElement*matrixElement;
}
}
return normSquared;
}
public void normalize() {
double norm = getFrobeniusNorm();
if (norm == 0) {
throw new ArithmeticException("Cannot get a unit vector from the zero vector.");
} else {
double oneOverNorm = 1.0 / norm;
multiplyEquals(oneOverNorm);
}
}
Since this is Java, I can't use techniques specific to the operating system and processor, but otherwise this seems like a standard floating-point algorithm issue.
I can improve the norm calculation using Kahan summation and/or dividing out the largest component, but the consistency between normalizing and calculating the norm is the real issue. The norm is more important than the direction, so I see this as finding the floating point vector closest in direction to the original vector with the constraint that the norm is 1 to machine precision. For my purposes, the original vector is exact.
Suppose the original vector is u. I call u.normalize()
. Then, if I compute Math.abs(u.getFrobeniusNorm()-1d
, in some cases, the result is hundreds of ulps. This is the problem. I can accept that the vector norm has error. I just want to normalize the vector such that the norm as calculated by u.getFrobeniusNorm()
is 1 to the smallest possible ulps. Improving u.getFrobeniusNorm()
makes sense, but I don't think that solves the consistency issue.
Upvotes: 4
Views: 1833
Reputation: 343
While the original question remains interesting, recently I found the source of the problem in my code. In another code, I was improving a summation by implementing a variation of Kahan Summation. I revisited the unit vector code and found that the normalization was not the problem.
The normalization method involves three steps:
To improve the normalization method, I calculated the norm with the improved summation, scaled the components by the reciprocal of the more accurate norm, and calculated the norm of the unit vector using the improved summation to check how well it was normalized. Sure enough, the unit vector was normalized to a much lower tolerance which was ~ machine precision * dimension. I compared the improved normalization method to previous method, and it was better. What surprised me was that the old normalization method was just as accurate if the second vector norm calculation used the improved summation.
So it was not the normalization itself that caused the problem, but rather the check of the normalization. It appears that naive summation is less accurate (even in a relative sense) for sums near 1 than for many other values. I say "many other values" the original problem occurred for vectors of all magnitudes in practice, but I suspect that some vectors, and therefore some sums, have the same bad behavior as unit vectors (with sums near 1). However, the problem norm values are probably sparsely distributed over the real numbers, like powers of 2.
In the original method, the problem was that the two vector norm calculations had different relative accuracies. If you start with a vector with a norm near one, the two calculations would have nearly identical relative accuracies, and the normalization itself would be inaccurate.
So now, I don't calculate the vector norm of the unit vector as a check.
Upvotes: 0
Reputation: 49
Reading your question, got me thinking about the source of the error. I identified that the problem starts with calculating the sqrt() of the value returned by your getFrobeniusNormSquared(). I am afraid that the exactness up to machine precision is indeed impossible as others suggested. Nonetheless, the following implementation improved it a bit by postponing the sqrt():
double norm = X * X + Y * Y + Z * Z;
if (norm > 0.0)
{
norm = 1.0 / norm;
return new vector
{
X = Math.Sign(X) * Math.Sqrt(X * X * norm),
Y = Math.Sign(Y) * Math.Sqrt(Y * Y * norm),
Z = Math.Sign(Z) * Math.Sqrt(Z * Z * norm),
};
}
else
throw new ArithmeticException("Cannot get a unit vector from the zero vector.");
Of course, this will cost you some CPU cycles and the result is still not up to machine precision, but it has improved a bit.
Upvotes: 1
Reputation: 222763
I have been thinking about this a little but have not had time to work on the mathematics. I am thinking along these lines:
Suppose you calculate the norm and find it is 1 plus a small number d (which might be positive or negative). You might consider changing elements of the vector by small amounts until the calculated norm is 1.
Suppose you are going to make a change e in some element with value x. The cost is that it changes the direction of the vector by an angle of approximately arccos((1+xe)/sqrt(1+2xe+e2)), using some approximations I do not document since this is preliminary. This function has a negative slope for positive x; it is lower to choose larger x (and it is vice-versa for negative x). The sign of e does not change the direction of the slope, so it is irrelevant to this cost. The benefit is that it will change the square of the norm by (x+e)2 - x2 = 2xe+e2. The sign of e chooses whether this is positive or negative, so we choose e with sign opposite x. Then the benefit is proportional to x. Clearly the best cost-benefit ratio is to choose the element with the largest magnitude.
Thus, if you are going to make one change to the elements, change the element with the largest magnitude.
I have not yet considered partitioning the change: Suppose instead of one change, we make two or more, applied to different elements. Does that increase the cost-benefit ratio?
Another consideration is that the change we can make to elements is quantized, so, instead of making a change of e to a larger element, we might prefer a change of e/2 to a smaller element if it changes the calculated norm by the same amount.
Upvotes: 0
Reputation: 14205
You may be able to reformulate your stated question by doing some awkward greedy rounding hack. (You can also probably formulate it as an even-more-awkward network flow problem.) I don't think you can guarantee a "nice" rounding here where the stuff rounded up all has larger fractional parts than all of the stuff rounded down.
Backing up a little bit, I'm not sure why you got yourself into a position where you need the norm of a vector to be exactly 1, rather than within n*machine epsilon of 1. It might be a better idea to rethink the code that uses the normalised vector than to rethink the normalisation itself.
(You also say this: "As for the question of unity, the unit vector has norm 1 exactly, and all my equations use that fact. I want the floating point representation closest to that unit vector (by inner product)." This changes the game completely; the closest vector in Euclidean norm to the exactly-normalised vector will be the rounded normalised vector.)
Upvotes: 1
Reputation: 20059
Simple: Your requirement can not be met - assuming any imaginable vector possible, it can't event be met with any precision less than infinite.
You can get reasonably close to 1.0, and that should be good enough in most cases (it should already be with your code).
If it turns out the accuracy is too small for your case, you need to perform error analysis (since youre asking the question in the first place, get someone with experience to do the error analysis for you - this will cost money).
The basics behind floating point accuray are explained here: http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html (What Every Computer Scientist Should Know About Floating-Point)
Upvotes: 1