How to distribute the number of elements in a bucket so that it is Within a Range - Algorithm

Question

I have been solving a problem where in say I have 50 elements n1, n2, n3, ... , n50.

Now I have a limited number of buckets, say 5 buckets and the bucket can hold a range from, say 100 to 150 only (which is nothing but the sum of elements in that bucket), but neither less than 100, nor more than 150.

Which algorithm is most suitable for solving this problem, such that all the 5 buckets are used and all the elements (n1, n2, n3, ...) are also used up.

If a bucket is not used or if any element is left out, then the algorithm just returns "InvalidConditionsFound".

I tried Knapsack which gives you a combination as close to a gived number, but how to get it within a range and also make sure that it chooses wisely such that all the buckets are getting filled, and not that two bucket gets 150 full and the other bucket is only at, say 50

josliber · Accepted Answer

One approach would be to model this as an integer program. Let's assume there are "m" numbers y_1, y_2, ..., y_m and "n" buckets. Define variables x_ij, with an index "i" for each of the numbers you are trying to assign and an index "j" for each bucket. These are binary variables indicating if each number is assigned to each bucket.

Now you have two sets of logical constraints. First, you need to assign each number to exactly one bucket. You can do this by adding the following constraint for each number "i":

x_i1 + x_i2 + ... + x_in = 1

You also have the range constraints on each bucket "j":

100 <= y_1 x_1j + y_2 x_2j + ... + y_m x_mj <= 150

Really you are just looking for any feasible solutions, so you can just set the objective to 0 and treat this as a feasibility problem.

While you are solving an NP-complete problem so this is a theoretically challenging exercise, you may find that modern optimization software can solve the problem for problem sizes of interest to you.

To give a sense of the scalability, consider the following implementation using the lpSolve package in R; it returns the assignments from numbers to buckets when a valid assignment exists and otherwise returns a vector of NA values:

library(lpSolve)
range.assign <- function(weights, n, min.sum, max.sum) {
  m <- length(weights)
  one.mat <- t(sapply(1:m, function(i) c(replicate(n, 1*((1:m) == i)))))
  w.mat <- t(sapply(1:n, function(j) c(rep(0, m*(j-1)), weights, rep(0, m*(n-j)))))
  mod <- lp(objective.in = rep(0, n*m),
            const.mat = rbind(one.mat, w.mat, w.mat),
            const.dir = rep(c("=", ">=", "<="), c(m, n, n)),
            const.rhs = rep(c(1, min.sum, max.sum), c(m, n, n)),
            all.bin=TRUE)
  if (mod$status == 0) {
    apply(matrix(mod$solution, nrow=m), 1, function(x) which(x >= 0.999))
  } else {
    rep(NA, m)
  }
}
range.assign(1:5, 2, 5, 10)
# [1] 1 1 1 1 2
range.assign(1:5, 2, 5, 6)
# [1] NA NA NA NA NA

I tested this with m weights randomly sampled from [1, 2, ..., 10], acceptable range for a bucket [100, 150], and total number of buckets n = ceiling(5.5*m / 125). I saw the following runtime scaling:

m = 100, n = 5: 0.1 seconds
m = 200, n = 9: 0.6 seconds
m = 300, n = 14: 2.2 seconds
m = 400, n = 18: 16.9 seconds

It seems like the problem can be solved exactly using free solvers for problems with a dozen buckets and a few hundred weights (and this weight vector structure). Of course your complexity result suggests it won't be efficiently solvable for huge problems, but you may be able to solve instances with sizes that interest you. Further optimizations may be possible by:

Using commercial solvers such as Gurobi or cplex (both are non-free in general but free for academic use).
Inputting the constraint matrix in a sparse format.

How to distribute the number of elements in a bucket so that it is Within a Range - Algorithm

Answers (2)

Related Questions