Reputation: 9582
I am writing an algorithm that involves taking a set of numbers and putting them into buckets. Can you help me implement these two simple methods? Let me know if I need to explain more.
// return a vector where each element represents the
// size of the range of numbers in the corresponding bucket
// buckets should be equal in size +/- 1
// doesn't matter where the bigger/smaller buckets are
vector<int> makeBuckets(int max, int numberOfBuckets);
// return which bucket n belongs in
int whichBucket(int max, int numberOfBuckets, int n);
Example output
makeBuckets(10, 3) == { 3, 3, 4 }; // bucket ranges: (0, 2), (3, 5), (6, 9)
whichBucket(10, 3, 0) == 0;
whichBucket(10, 3, 1) == 0;
whichBucket(10, 3, 2) == 0;
whichBucket(10, 3, 3) == 1;
whichBucket(10, 3, 4) == 1;
whichBucket(10, 3, 5) == 1;
whichBucket(10, 3, 6) == 2;
whichBucket(10, 3, 7) == 2;
whichBucket(10, 3, 8) == 2;
whichBucket(10, 3, 9) == 2;
Upvotes: 4
Views: 4862
Reputation: 14789
Given an integral range [B, E), the most evenly (The size differences never exceed 1.) split Nbin bins are:
[ ⌊B+E×Nindex÷Nbin⌋, ⌊B+E×(Nindex+1)÷Nbin⌋ ) where Nindex=0 to (Nbin-1).
Example ― Getting pixel ranges to feed multiple threads:
for(unsigned int worker_index = 0; worker_index < n_workers; ++worker_index) {
std::cout
<< "[" << n_pixels * worker_index / n_workers << ", "
<< n_pixels * (worker_index + 1) / n_workers << ")"
<< std::endl;
}
[0, 0)
[0, 0)
[0, 0)
[0, 1)
[1, 1)
[1, 1)
[1, 2)
[2, 2)
[2, 2)
[2, 3)
[3, 3)
[3, 3)
[3, 4)
[4, 4)
[4, 4)
[4, 5)
[0, 1)
[1, 2)
[2, 3)
[3, 4)
[4, 5)
[5, 6)
[6, 7)
[7, 9)
[9, 10)
[10, 11)
[11, 12)
[12, 13)
[13, 14)
[14, 15)
[15, 16)
[16, 18)
[0, 2259520)
[2259520, 4519040)
[4519040, 6778560)
[6778560, 9038080)
[9038080, 11297600)
[11297600, 13557120)
[13557120, 15816640)
[15816640, 18076160)
[18076160, 20335680)
[20335680, 22595200)
[22595200, 24854720)
[24854720, 27114240)
[27114240, 29373760)
[29373760, 31633280)
[31633280, 33892800)
[33892800, 36152320)
Upvotes: 2
Reputation: 52337
Let's say you need to divide a range [0,n] into k buckets.
How to best create buckets depends on the size of n. For example, if you have a small n, you could just have an array of size n that stores the bucket index for each number. In the loop above, you would fill up that array. Then the query whichBucket() would run in O(1).
If n is large, however, this is impractical. In this case, you would do your bucket sorting completely implicitely. That means, for each incoming query, you can directly compute the corresponding bucket index using e and o.
Upvotes: 9
Reputation: 9582
Thanks ypnos for the answer. Here's my implementation in C++.
vector<int> makeBuckets(int max, int numberOfBuckets) {
int e = max / numberOfBuckets;
vector<int> b(numberOfBuckets, e);
fill(b.begin(), b.begin() + (max % numberOfBuckets), e + 1);
return b;
}
int whichBucket(int max, int numberOfBuckets, int n) {
return n * numberOfBuckets / max;
}
Upvotes: 4
Reputation: 217275
You may use the naive implementation:
std::vector<std::size_t> parts(std::size_t m, std::size_t size)
{
std::vector<std::size_t> res(size);
for (std::size_t i = 0; i != m; ++i) {
++res[i % size];
}
return res;
}
std::size_t whichPart(std::size_t m, std::size_t size, std::size_t n)
{
std::size_t index = 0;
for (auto i : parts(m, size)) {
if (n < i) {
return index;
}
++index;
n -= i;
}
throw std::runtime_error("invalid argument");
}
Upvotes: 1