Swiftslide
Swiftslide

Reputation: 1347

Simple storage allocation algorithm

We have a whole bunch of machines which use a whole bunch of data stores. We want to transfer all the machines' data to new data stores. These new stores vary in the amount of storage space available to the machines. Furthermore each machine varies in the amount of data it needs stored. All the data of a single machine must be stored on a single data store; it cannot be split. Other than that, it doesn't matter how the data is apportioned.

We currently have more data than we have space, so it is inevitable that some machines will need to have their data left where it is, until we find some more. In the meantime, does anyone know an algorithm (relatively simple: I'm not that smart) that will provide optimal or near-optimal allocation for the storage we have (i.e. the least amount of space left over on the new stores, after allocation)?

I realise this sounds like a homework problem, but I assure you it's real!

Upvotes: 1

Views: 866

Answers (1)

Tyler Durden
Tyler Durden

Reputation: 11558

At first glance this may appear to be the multiple knapsack problem (http://www.or.deis.unibo.it/knapsack.html, chapter 6.6 "Multiple knapsack problem - Approximate algorithms"), but actually it is a scheduling problem because it involves a time element. Needless to say it is complicated to solve these types of problems. One way is to model them as network flow and use a network flow library like GOBLIN.

In your case, note that you actually do not want to fill the stores optimally, because if you do that, smaller data packages will be more likely to be stored because it will lead to tighter packings. This is bad because if large packages get left on the machines then your future packings will get worse and worse. What you want to do is prioritize storing larger packages, even if that means leaving more extra space on the stores, because then you will gain more flexibility in the future.

Here is how to solve this problem with a simple algorithm:

(1) Determine the bin sizes and sort them. For example, if you have 3 stores with space 20 GB, 45 GB and 70 GB, then your targets are { 20, 45, 70 }.

(2) Sort all the data packages by size. For example, you might have data packages: { 2, 2, 4, 6, 7, 7, 8, 11, 13, 14, 17, 23, 29, 37 }.

(3) If any of the packages sum to > 95% of a store, put them in that store and go to step (1). Not the case here.

(4) Generate all the permutations of two packages.

(5) If any of the permutations sum to > 95% of a store, put them in that store. If there is a tie, prefer a combination with a bigger package. In my example, there are two such pairs { 37, 8 } = 45 and { 17, 2 } = 19. (Notice that using { 17, 2 } trumps using { 13, 7 }). If you find one or more matches, go back to step (1).

Okay, now we just have one store left: 70 and the following packages: { 2, 4, 6, 7, 7, 11, 13, 14, 23, 29 }.

(6) Increase the number of perms by 1 and go to Step 5. For example, in our case we find that no 3-perm adds to over 95% of 70, but the 4 perm { 29, 23, 14, 4 } = 70. At the end we are left with packages { 2, 6, 7, 7, 11, 13 } that are left on the machines. Notice these are mostly the smaller packages.

Notice that perms are tested in reverse lexical order (biggest first). For example, if you have "abcde" where e is the biggest, then the reverse lexical order for 3-perms is:

cde
bde
ade
bce
ace
etc.

This algorithm is very simple and will yield a good result for your situation.

Upvotes: 1

Related Questions