fer
fer

Reputation: 85

Homework: Dynamic programming - Find optimal subset allocation for an array of integer

I'm not sure if this is the place to ask homework question - I apologise if it is not. And to be upfront about this, I don't want answers, just ways to think about the question.

Suppose I have an array of n integer, which represents the position of villages along a straight line axis. Now, I am allowed to build k toilets, with k being predetermined and that n >= k. I have to build the k toilets such that the total sum of distance between each village and the nearest toilet is minimized.

How do I find what is the minimal possible value of the allocation?

A simple example:
Villages: [0, 1, 2, 3, 4], n = 5
Toilet(Optimal): [2], if k = 1 thus making the sum = 6, as Sum([|0-2|], [|1-2|], [|2-2|], [|3-2|], [|4-2|])
Toilet(Optimal): [1, 3] if k = 2 thus making the sum = 3, as Sum([|0-1|], [|1-1|], [|2-1|], [|3-3|], [|4-3|])

Now, one way that I have thought of, is to generate all possible configuration using recursion. Then for each toilet configuration, calculate the minimum distance, and pick the minimum of minimum distance.

It works if k and n is kept small. But suppose I have 40 villages and 15 toilets to allocate. This makes the problem unwieldy, as it suddenly becomes a problem of size 40 C 15, or 40225345056.

I've a feeling it has something to do with dynamic programming, but I just can't seem to bend the concept of dynamic programming to the current problem.

And again, to reiterate, please don't pass me the answer, but different ways to frame the question.

Upvotes: 1

Views: 522

Answers (2)

גלעד ברקן
גלעד ברקן

Reputation: 23955

As an aside, using some algebra, we can find that the sum of distances from the average can be calculated without summing those distances individually. One way is:

2 * [(num_smaller - 1) * sum_smaller + num_smaller * (sum_larger - (n-1) * average)]

where: n is the total number of elements
       num_smaller is the cardinality of {a | a <= average}
       num_larger is the cardinality of {b | b > average}
       sum_smaller is sum of {a | a <= average}
       sum_larger is the sum of {b | b > average}

Why: 
                   v = (a1 + a2 + a3 ... + b3 + b2 + b1) / n

                 n*v = (a1 + a2 + a3 ... + b3 + b2 + b1)

       n*v - (n-1)*v = (a1 + a2 + a3 ... + b3 + b2 + b1) - (n-1)*v

       n*v - n*v + v = (a1 + a2 + a3 ... + b3 + b2 + b1) - (n-1)*v

                   v = (a1 + a2 + a3 ... + b3 + b2 + b1) - (n-1)*v

Now we are looking for the sum of absolute differences from the average:

abs_sum_diffs = (v - a1) + (v - a2) + (v - a3) + ... + |v - b3| + |v - b2| + |v - b1|
  where a's are lower than or equal to v, and b's are greater than v         

Let's first express just the sum of differences:

      v - a1 = (     a2 + a3 ... + b3 + b2 + b1) - (n-1)*v
    + v - a2 = (a1      + a3 ... + b3 + b2 + b1) - (n-1)*v
    + v - a3 = (a1 + a2      ... + b3 + b2 + b1) - (n-1)*v
    ...
    + v - b1 = (a2 + a2 + a3 ... + b3 + b2     ) - (n-1)*v
    + v - b2 = (a1 + a2 + a3 ... + b3      + b1) - (n-1)*v
    + v - b3 = (a1 + a2 + a3 ...      + b2 + b1) - (n-1)*v
   _________________________________________________________
   sum_diffs = (n-1)*sum(a's)  +  (n-1)*sum(b's) - n*(n-1)*v // each a,b appears (n-1) times

             = (n-1) * (sum (a's + b's) - n * v)
             = (n-1) * (0)  // because sum(a's + b's) = n * v
             = 0

So the negative differences of elements from the average, (v - b), cancel out the positive differences of elements from the average, (v - a), which means their sums equal each other. To find the absolute sum of differences, we can use similar algebra to find a formula just for the positive differences,

sum {v - a | a <= v}

and multiply it by two, which I leave as an exercise.

Let's assume the location of the toilet will be placed at the average of each k'th group's elements. To avoid recalculating sub-problems, we can build a matrix, M[k][i], starting with k = 1, for each set of i villages, representing the optimal way to group i villages into k groups.

Let's look at your example, [0, 1, 2, 3, 4], at k = 1 for each i:

i = 0, M[1][0]                         = 0
i = 1, M[1][1] = 0.5 + 0.5             = 1
i = 2, M[1][2] = 1 + 0 + 1             = 2
i = 3, M[1][3] = 1.5 + 0.5 + 0.5 + 1.5 = 4
i = 4, M[1][4] = 2 + 1 + 0 + 1 + 2     = 6

Now, when we calculate k = 2, we can rely on our first row of the matrix. This time, for each i, we check the options for including this village in different j-sized groups, adding the optimal cost for k-1 groups up to village i-j, which we calculated in the previous step:

i = 0;               M[2][0] = null  // we skip dividing one village into two groups

i = 1, j = 1;        M[2][1] = 0 + M[k-1][i-j] = 0 + M[1][0] = 0

i = 2, j = 1,2;      M[2][2] = min(0 + M[1][1] = 1
                                  ,1 + M[1][0] = 1)          = 1

i = 3, j = 1,2,3;    M[2][3] = min(0 + M[1][2] = 2
                                  ,1 + M[1][1] = 2
                                  ,2 + M[1][0] = 2)          = 2

i = 4, j = 1,2,3,4;  M[2][3] = min(0 + M[1][3] = 4
                                  ,1 + M[1][2] = 3
                                  ,2 + M[1][1] = 3
                                  ,4 + M[1][0] = 4)          = 3

The matrix now looks like:

[[x][x][x][x][x]
,[0][1][2][4][6]
,[x][0][1][2][3]]

We can continue the same procedure for larger k and i.

Upvotes: 1

Jon Snow
Jon Snow

Reputation: 11882

I think first you need to figure out the basic problem: when you have N villages and you have to build 1 toilet, where to build it.

Your N should start with 1. Then 2, 3...

Next, you're going to build 2 toilets among the N villages. I think it's clear that you'll need to divide the villages into 2 groups, which will use the 2 toilets respectively.

The problem becomes how to divide the villages into 2 groups. In this case, start N from 1 also, then 2, 3...

Have your algorithm work for N (any positive integer) and 2 toilets.

Once you get this far, I think you can continue onto 3 toilets. The problem may be solved already. If not, I'm sure once you can build 3 toilets among N villages, you can build M (M <= N) toilets.

The basic methodology here is: start with the simplest case and solve it, then increase the problem complexity step by step and make sure your solution can handle the increased complexity. Recursion sounds like a good way to go. Don't worry about performance until it is really a problem. Solve the problem first, optimise later.

Good luck!

Upvotes: 1

Related Questions