Reputation: 668
I'm trying to return a vector (1-d numpy array) that has a sum of 1. The key is that it has to equal 1.0 as it represents a percentage. However, there seems to be a lot of cases where the sum does not equal to 1 even when I divided each element by the total. In other words, the sum of 'x' does not equal to 1.0 even when x = x'/sum(x')
One of the cases where this occurred was the vector below.
x = np.array([0.090179377557090171, 7.4787182000074775e-05, 0.52465058646452456, 1.3594135000013591e-05, 0.38508165466138505])
The summation of this vector x.sum()
is 1.0000000000000002 whereas the summation of the vector that is divided by this value is 0.99999999999999978.
From that point on that reciprocates.
What I did do was round the elements in the vector by the 10th decimal place (np.round(x, decimals = 10)
) then divided this by the sum which results in a sum of exactly 1.0. This works when I know the size of the numerical error.
Unfortunately, that would not be the case in usual circumstances.
I'm wondering if there is a way to correct the numerical error of only when the vector is known so that the sum will equal to 1.0.
Edit: Is floating point math broken? This question doesn't answer my question as it states only 'why' the difference occurs and not how to resolve the issue.
Upvotes: 2
Views: 1393
Reputation: 665
A bit of a hacky solution:
x[-1] = 0
x[-1] = 1 - x.sum()
Essentially shoves the numerical errors into the last element of the array. (No roundings beforehand are needed.)
Note: A mathematically simpler solution:
x[-1] = 1.0 - x[:-1].sum()
does not work, due to different behavior of numpy.sum
on whole array vs a slice.
Upvotes: 3