Ralph
Ralph

Reputation:

Algorithm for fair distribution of numbers into two sets

Given a set of n numbers (1 <= n <= 100) where each number is an integer between 1 and 450,we need to distribute those set of numbers into two sets A and B, such that the following two cases hold true:

  1. The total numbers in each set differ by at most 1.
  2. The sum of all the numbers in A is as nearly equal as possible to the sum of all the numbers in B i.e. the distribution should be fair.

Can someone please suggest an efficient algorithm for solving the above problem ?

Thank You.

Upvotes: 3

Views: 4220

Answers (13)

starblue
starblue

Reputation: 56752

Since the numbers are small it is not NP-complete.

To solve it you can use dynamic programming:

Make a three-dimensional table of booleans where true at t[s, n, i] means that the sum s can be reached with a subset of n elements below index i. To compute the value for t[s, n, i] check t[s, n, i-1] and t[s - a[i], n-1, i-1]. Then look through the table at second index n/2 to find the best solution.

Edit: You actually don't need the complete table at once. You can make a two dimensional table t_i[s, n] for each index i and compute the table for i from the table for i-1, so you only need two of these two-dimensional tables, which saves a lot of memory. (Thanks to Martin Hock.)

Upvotes: 9

Sinan &#220;n&#252;r
Sinan &#220;n&#252;r

Reputation: 118118

@ShreevatsaR notes that the algorithm below is known as the greedy algorithm. It does not do very well with certain inputs (I tried 10 different sets of randomly generated sets of inputs of size 100 and in all cases, the sums were very close which led me to think sorting the input was enough for the success of this algorithm).

See also "The Easiest Hard Problem", American Scientist, March-April 2002, recommended by ShreevatsaR.

#!/usr/bin/perl

use strict;
use warnings;

use List::Util qw( sum );

my @numbers = generate_list();

print "@numbers\n\n";

my (@A, @B);
my $N = @numbers;

while ( @numbers ) {
    my $n = pop @numbers;
    printf "Step: %d\n", $N - @numbers;
    {
        no warnings 'uninitialized';

        if ( sum(@A) < sum(@B) ) {
            push @A, $n;
        }
        else {
            push @B, $n;
        }
        printf "A: %s\n\tsum: %d\n\tnum elements: %d\n",
            "@A", sum(@A), scalar @A;
        printf "B: %s\n\tsum: %d\n\tnum elements: %d\n\n",
            "@B", sum(@B), scalar @B;
    }
}

sub generate_list { grep { rand > 0.8 } 1 .. 450 }

Note that generate_list returns a list in ascending order.

Upvotes: 1

Jackson
Jackson

Reputation: 5657

If you need the perfect answer then you have to generate and loop through all of the possible sets of answers. If a pretty good answer is all you need then a technique like simulated annealing is the way to go. Heres some C code that uses a very primitive cooling schedule to find an answer.

#include <stdio.h>
#include <stdlib.h>

#define MAXPAR 50
#define MAXTRIES 10000000

int data1[] = {192,130,446,328,40,174,218,31,59,234,26,365,253,11,198,98,
               279,6,276,72,219,15,192,289,289,191,244,62,443,431,363,10
              } ;
int data2[] = { 1,2,3,4,5,6,7,8,9 } ;

// What does the set sum to
int sumSet ( int data[], int len )
{
    int result = 0 ;
    for ( int i=0; i < len; ++i )
        result += data[i] ;
    return result ;
}

// Print out a set
void printSet ( int data[], int len )
{
    for ( int i=0; i < len; ++i )
        printf ( "%d ", data[i] ) ;
    printf ( " Sums to %d\n", sumSet ( data,len ) ) ;
}

// Partition the values using simulated annealing
void partition ( int data[], size_t len )
{
    int set1[MAXPAR] = {0} ;    // Parttition 1
    int set2[MAXPAR] = {0} ;    // Parttition 2
    int set1Pos, set2Pos, dataPos, set1Len, set2Len ;  // Data about the partitions
    int minDiff ; // The best solution found so far
    int sum1, sum2, diff ;
    int tries = MAXTRIES ; // Don't loop for ever

    set1Len = set2Len = -1 ;
    dataPos = 0 ;

    // Initialize the two partitions
    while ( dataPos < len )
    {
        set1[++set1Len] = data[dataPos++] ;
        if ( dataPos < len )
            set2[++set2Len] = data[dataPos++] ;
    }


    // Very primitive simulated annealing solution
    sum1 = sumSet ( set1, set1Len ) ;
    sum2 = sumSet ( set2, set2Len ) ;
    diff = sum1 - sum2 ;    // The initial difference - we want to minimize this
    minDiff = sum1 + sum2 ;
    printf ( "Initial diff is %d\n", diff ) ;

    // Loop until a solution is found or all are tries are exhausted
    while ( diff != 0 && tries > 0 )
    {
        // Look for swaps that improves the difference
        int newDiff, newSum1, newSum2 ;
        set1Pos = rand() % set1Len ;
        set2Pos = rand() % set2Len ;

        newSum1 = sum1 - set1[set1Pos] + set2[set2Pos] ;
        newSum2 = sum2 + set1[set1Pos] - set2[set2Pos] ;
        newDiff = newSum1 - newSum2 ;
        if ( abs ( newDiff ) < abs ( diff ) ||      // Is this a better solution?
                tries/100 > rand() % MAXTRIES )     // Or shall we just swap anyway - chance of swap decreases as tries reduces
        {
            int tmp = set1[set1Pos] ;
            set1[set1Pos] = set2[set2Pos] ;
            set2[set2Pos] = tmp ;
            diff = newDiff ;
            sum1 = newSum1 ;
            sum2 = newSum2 ;

            // Print it out if its the best we have seen so far
            if ( abs ( diff ) < abs ( minDiff ) )
            {
                minDiff = diff ;
                printSet ( set1, set1Len ) ;
                printSet ( set2, set2Len ) ;
                printf ( "diff of %d\n\n", abs ( diff ) ) ;
            }
        }


        --tries ;
    }

    printf ( "done\n" ) ;
}


int main ( int argc, char **argv )
{
    // Change this to init rand from the clock say if you don't want the same
    // results repoduced evert time!
    srand ( 12345 ) ;
    partition ( data1, 31 ) ;
    partition ( data2, 9 ) ;
    return 0;
}

Upvotes: 0

Stephen Denne
Stephen Denne

Reputation: 37007

Simulated Annealing can quite quickly find better and better answers. You could keep 1. true while improving the nearness of 2.

Upvotes: 0

Patrice Bernassola
Patrice Bernassola

Reputation: 14416

I have an algorithm for you. It is using a lot of recursive and iterative concepts.

Assuming you have n number Xn with 1 <= n <= 100 and 1 <= Xn <= 450.

  1. If n < 3 then distribute numbers and stop algorithm,

  2. If n > 2 then sort your list of number in ascending order,

  3. Compute the total sum S of all numbers,
  4. Then divide the previous total S by (n - n%2)/2 and obtain the A value,
  5. Now we will create couple of numbers which addition will be as near as possible as A. Get the first number and find a second number in order to obtain a sum S1 as near as possible than A. Put S1 in a new list of number and keep in memory how the sum was computed in order to have the base numbers later.
  6. Execute 5. until numbers in the list is < 2. Then put the remaining numbers to the sum list and restart algorithm to point 1. with new list.

Example:

Assuming: n = 7 and numbers are 10, 75, 30, 45, 25, 15, 20

Pass 1:

Since n > 2 so sort the list : 10, 15, 20, 25, 30, 45, 75

Sum S = 220

A = 220 / ((7-1)/2) = 73

Couples:

10 & 75 => 85

15 & 45 => 60

20 & 30 => 50

Remaining numbers are < 2 so add 25 in the sum list : 85(10,75), 60(15,45), 50(20,30), 25(25)

Pass 2:

n = 4 and numbers are 85, 60, 50, 25

List count is > 2 so sort list : 25(25), 50(20,30), 60(15,45), 85(10,75)

Sum S is still the same (S=220) but A must be recompute : A = 220 / ((4-0)/2) = 110

Couples:

25 & 85 => 110

50 & 60 => 110

The Sum list is : 110(25(25),85(10,75)), 110(50(20,30),60(15,45))

Pass 3:

n = 2 and numbers are 110, 110

n < 3 so distribute numbers:

A = 25, 10, 75

B = 20, 30, 15, 45

This works on each scenario I have tested.

Upvotes: 2

fortran
fortran

Reputation: 76057

I would give genetic algorithms a try, as this seems a very nice problem to apply them.

The codification is just a binary string of length N, meaning 0 being in the first group and 1 in the second group. Give a negative fitness when the number of elements in each group differs, and a positive fitness when the sums are similar... Something like:

fitness(gen) = (sum(gen)-n/2))^2 + (sum(values[i]*(-1)**gen[i] for i in 0..n))^2

(And minimize the fitness)

Of course this can give you a sub-optimal answer, but for large real world problems it's usually enough.

Upvotes: -1

Olexiy
Olexiy

Reputation: 1084

First, find a solution to the problem without the first constraint (i.e. - making sums as close as possible). This problem can be solved using DP approach (you can read more about DP here, and the first problem - about coins - is very similar to yours).

Once you can solve it, you can add one more state to your DP - the number of persons selected to the subset already. This gives you a N^3 algorithm.

Upvotes: 2

Falaina
Falaina

Reputation: 6695

This is a constrained version of the Number Partioning Problem. Usually the goal is to find any 2 disjoint subsets that minimize the difference of the sums. Your problem is constrained in the sense you only consider 1 possiblity: 2 sets of size N/2 (or 1 set of N/2 and one set of N/2+1 if the total number if uneven). This dramatically reduces the search space, but I can't thnk of a good algorithm at the moment, I'll think about it.

Upvotes: 3

mob
mob

Reputation: 118595

Never mind, I thought the numbers were sequential. This looks kind of like the Knapsack Problem, which is NP hard.


The numbers are sequential?

  1. Put the largest number in A
  2. Put the next largest number in B
  3. Put the next largest number in B
  4. Put the next largest number in A
  5. Repeat step 1 until all the numbers are assigned.

Proof:

After every multiple of 4 numbers has been assigned, A and B both contain the same number of items, and the sum of the items in each group are the same because

(n) + (n - 3) == (n - 1) + (n - 2)

In the last iteration we are at Step 1 above and we have either 0, 1 1, 2 [1,2], or 3 [1,2,3] numbers remaining.

In case 0, we are done and the groups are equal in count and weight.

In case 1, we assign the number 1 to group A. Group A has one more item and one more weight. This is as fair as we can get in this situation.

In case 2, we assign the number 2 to group A and the number 1 to group B. Now the groups have the same number of items and group A has one extra weight. Again, this is as fair as we can get.

In case 3, assign the number 3 to group A, and assign numbers 2 and 1 to group B. Now the groups have the same weight (3 == 2 + 1) and group B has one extra item.

Upvotes: 2

Esteban Araya
Esteban Araya

Reputation: 29664

Any dual knapsack algorithm will do (regardless of distribution of numbers).

Upvotes: 0

psychotik
psychotik

Reputation: 39009

I assume the numbers are not sequential, and you can't re-balance?

Because of constraint 1, you're going to need to switch buckets every other insertion, always. So every time you're not forced to pick a bucket, pick a logical bucket (where adding the number would make the sum closer to the other bucket). If this bucket isn't the same one as your previous bucket, you get another turn where you're not forced.

Upvotes: 0

9b5b
9b5b

Reputation: 1528

your requirement in #2 needs clarification, because: "The sum of all the numbers in A is as nearly equal as possible to the sum of all the numbers in B" is clear, but then your statement "the distribution should be fair" makes everything unclear. What does 'fair' exactly mean? Does the process need a random element in it?

Upvotes: 1

RichH
RichH

Reputation: 6138

If the numbers are sequential then you just alternate assigning them between A and B.

I suspect they are not, in which case...

Assign the largest unassigned number to the group with the lowest sum unless the difference in size of the the groups is less than or equal to count of unassigned numbers (in which case assign all of the remaining numbers to smaller group).

It won't find the best solution in all cases, but its close and simple.

Upvotes: 2

Related Questions