Reputation: 30293

What's the most efficient way to evenly fill an unsorted list of "buckets" of varying sizes

Suppose I have an unsorted list of buckets. (Each bucket has a size property.) Suppose I have a quantity Q that I must distribute across the list of buckets as evenly as possible (i.e. minimize the maximum).

If the buckets were sorted in increasing size, then the solution would be obvious: fully fill each bucket, say buckets[i], until Q/(buckets.length-i) <= buckets[i]->size, and then fill the remaining buckets with that same quantity, Q/(buckets.length-i), as illustrated:

Filling buckets.

What's the most efficient way to solve this if the buckets aren't sorted?

I can only think of iterating like this (pseudocode):

while Q > 0
    for i in 0..buckets.length-1
        q = Q/(buckets.length-i)
        if q > buckets[i]->size
            q = buckets[i]->size
        buckets[i]->fill(q)
        Q -= q

But I'm not sure if there's a better way, or if sorting the list would be more efficient.

(The actual problem I face has more to it, e.g. this "unsorted" list is actually sorted by a separate property "rank", which determines which buckets would get the extra fills when quantities don't divide evenly, etc. So, for example, to use the sort-then-fill method, I'd sort the list by bucket size and rank. But knowing an answer to this would help me figure out the rest.)

Upvotes: 6

Answers (6)

Marc Glisse

Reputation: 7945

In one step, you start with n unsorted buckets of finite capacity, k infinite buckets (you store k, not a list of those, and at the first iteration k=0), and an amount of water w. In O(n) time, we are going to reduce the problem to another instance with n', k', w' where n' < c * n for a constant c < 1. Iterating this procedure will solve the problem (once n is a constant, you can solve it in constant time) in linear time: n+c*n+c^2*n+...=O(n).

Among all n finite capacities, pick the median (i.e. pick one such that half of the capacities are higher and half are lower). This can be done in O(n) time (selection algorithm). Compute the sum of 1) the lower capacities and 2) the median capacity multiplied by the number of buckets of higher capacity (including the infinite ones).

If that's less than w, you know you will need to fill the buckets higher, so in particular all the lower capacity buckets will be filled. Remove them, remove the sum of their capacities from w, and you are done for this iteration, n'=n/2.

If on the other hand the sum is larger than w, you know that no bucket will be filled to the median capacity or higher. All buckets of higher capacity can thus be removed and their number added to the number of infinite buckets. w remains unchanged. Again, n'=n/2, and we are done.

A few easy details are skipped (in particular how to handle the case where many buckets have exactly the same capacity) to keep it short. You also need some cleanup at the end, once you know the right level of water, to set it for each "infinite" (i.e. non-full) bucket.

Upvotes: 1

Andrew Tomazos

Reputation: 68728

If you can determine q, the appropriate minimum level to fill each bucket such that the total is Q, than the linear solution is clear:

for (bucket b : buckets)
{
    int f = max(b.capacity(), q);
    b.fill(f);
}

So the problem is determining that level q.

You could binary search for q. That is we know q is an integer between min(b.capacity) and max(b.capacity). ie:

start with a candidate q' half way between min(capacity) and max(capacity)
make a pass of the buckets calculating the total amount Q' resulting from using q'
if (Q' > Q) than repeat with q' reduced by half
if (Q' < Q) than repeat with q' increased by half
return q = q'

Each pass of step 2 is O(N), and there will be log(L) passes where L = max(capacity) - min(capacity)

This works better than sorting when L << N

A sufficient statistic is to reduce the buckets to a histogram:

unordered_set<int,int> bucket_capacity;

for (bucket b : buckets)
    bucket_capacity[b.capacity]++;

This is still linear however in the worst case doesn't get us much because the buckets may have distinct sizes, however it bounds the passes by L so the efficiency is now O(min(L,N) * logL)

Again this works well when L << N the efficiency becomes O(LlogL)

I suspect the following is true, but am not 100%: In the case where L >> N it can be shown that there is no linear solution. First we assume we have a linear solution. We then use this solution as a tool to do a comparison sort in linear time. It has been shown comparison sort is impossible in linear time, therefore our assumption must be false, and there is no linear solution.

Upvotes: 2

leemes

Reputation: 45725

An alternative idea would be as follows. Determine the average number of items per buckets. Then try to fill all buckets with that number (not all buckets can hold that number of items, in general).

Afterwards, you have a number of remaining items to be placed in buckets (because not all have fit in the previous iteration) as well as a list of buckets which can hold more items than they currently contain (calculated in the previous iteration).

Again, calculate the average number of items to be distributed on that remaining buckets based on the remaining number of items to be distributed.

Repeat until you placed all items.

I expect a running time of O(n * log n), but didn't analyze it. It's the same running time than your sort-then-fill method, however, it is expected to be lower if your buckets have only a limited number of different sizes, like: some are small, some are big, some are huge.

Upvotes: 1

leemes

Reputation: 45725

I think it might be possible in linear time, however I'm stuck at a certain point. Maybe you can solve the problem, maybe it can't be solved this way.

Consider the following algorithm.

Based on binary search, we want to find the smallest bucket which isn't fully filled. Finding such a bucket in a list of bucket is maybe possible in linear time, but as I said, I'm stuck here. Once we found that bucket, the rest becomes trivial, since for all smaller buckets we sum up their sizes, subtract it from the total number of items to be placed, divide this by the number of buckets larger or equal to the bucket we just found.

So the following is an attempt to solve the problem: What is the smallest bucket which isn't fully filled? The algorithm is motivated by QuickSelect.

Pick a pivot bucket. See if it is smaller or larger than the bucket we are looking for. (This step is trivial.)

If it is smaller, sum up the sizes of all buckets smaller or equal than this one, subtract this sum from the total number of items and continue the search on the set containing all larger buckets.
If it is larger, we would have to do a similar thing, but now subtract the number of items which are placed in all buckets larger than this one. We don't know the number of items to be placed in these buckets. This is the problem... But if we knew, we'd continue the search on the set containing all smaller buckets.

If this algorithm worked, it would run in expected linear time for random pivot elements (see QuickSelect).

Upvotes: 2

Mecki

Reputation: 133189

In many cases, where the solution is "so simple" or "so effective" if the data was sorted, yet very complicated or ineffective in case it isn't, the best solution is very often to just sort the data first and then go for the simple, effective solution. Even though this means you will have the overhead of sorting the data first, there are plenty of very good sort algorithms available for pretty much any purpose and in many cases the total overhead of "first sorting the data and then applying a simple, effective algorithm to it" is still lower than "not sorting the data and applying a very complicated, ineffective algorithm to it".

The fact that you need the data sorted by a different key only means to me, that you need two lists here, each one sorted by a different criteria. Unless we are talking of several thousand buckets here, the memory overhead for a second list will most likely not be an issue (after all both lists only contains pointers to your bucket objects, that means 4/8 bytes per pointer, depending on fact if you have 32 or 64 bit code). One list has the buckets sorted by size the other list has the buckets sorted by "rank" and when when adding new items as described in your question, you use the "sorted by size list", while using the "sorted by rank" list the same way you are using it already by now.

Upvotes: 3

Marshall Clow

Reputation: 16690

Why do you need the list of buckets to be sorted? Just iterate over the buckets twice.

The first time count up all the sizes. From that you can say, "I want K items in every bucket"

Second time though, fill up the buckets.

Upvotes: -1

What&#39;s the most efficient way to evenly fill an unsorted list of &quot;buckets&quot; of varying sizes

Answers (6)

Related Questions

What's the most efficient way to evenly fill an unsorted list of "buckets" of varying sizes