Reputation: 175
Given N buckets and some elements E1 (W1) E2(W2). I want to distribute the N buckets between The elements Ei based on their weights Wi
For example N = 20, W1 = 5 W2 = 5 W3 = 10 so
E1_buckets = 20*(5/20) = 5
E2_buckets = 20*(5/20) = 5
E3_buckets = 20*(10/20) = 10
I must have the individual buckets (5+5+10=20) sum up to N. I was thinking of doing something like this
bucket[i] = round(N*(W[i]/TOT_WGT) where W[i] = element weight, and TOT_WGT = sum of weights W[i]
It seems however that I could run into error from imprecision in the representation of floating point numbers. Is it possible with floating point arithmetic to guarantee that the sum of buckets would always sum to N?
An alternative way would be to always take the floor and assign the excess to some random element
bucket[i] = floor(N*(W[i]/TOT_WGT)
bucket[k] += (N-sum_of_buckets)
While it doesn't guarantee perfect weighting I do get the buckets summing to N. Any thoughts, am I missing something and there is a possibly simple way to do this?
Upvotes: 0
Views: 412
Reputation: 4941
The best way to do it is to not to represent a bin using its width. You're trying to represent a continuous interval, and doing that by taking the union of subintervals aligned ~just right~ is tricky to say the least.
Instead, calculate the positions of the interior dividers ({5, 10} in your example), then represent your buckets as pairs of endpoints (the endpoints being {0, 5, 10, 20} in the example). Whenever you need the width of a bin, return the difference between that bin's two endpoints. Yeah, the widths of the bins might be off from the weights by a bit, but if your application is sensitive to that error your should really be using an exact numeric type instead.
Upvotes: 1
Reputation: 19601
Instead of calculating the number of buckets in element i, you could calculate the number of buckets in the first i elements, and then later subtract the number of buckets in the first i-1 elements to get the number of buckets in element i.
In this case, the number of buckets in the first i elements could be round (N * SUM_k_up_to_i(W[k])/TOT_WGT). In that case the number of elements in all buckets will be round(N * TOT_WGT/TOT_WGT) which is very likely to sum to N - and which you could in any case replace with N, and you are guaranteed that the sum of the buckets will be N.
Upvotes: 1