Reputation:
Given a set of n numbers (1 <= n <= 100) where each number is an integer between 1 and 450,we need to distribute those set of numbers into two sets A and B, such that the following two cases hold true:
Can someone please suggest an efficient algorithm for solving the above problem ?
Thank You.
Upvotes: 3
Views: 4220
Reputation: 56752
Since the numbers are small it is not NP-complete.
To solve it you can use dynamic programming:
Make a three-dimensional table of booleans where true at t[s, n, i] means that the sum s can be reached with a subset of n elements below index i. To compute the value for t[s, n, i] check t[s, n, i-1] and t[s - a[i], n-1, i-1]. Then look through the table at second index n/2 to find the best solution.
Edit: You actually don't need the complete table at once. You can make a two dimensional table t_i[s, n] for each index i and compute the table for i from the table for i-1, so you only need two of these two-dimensional tables, which saves a lot of memory. (Thanks to Martin Hock.)
Upvotes: 9
Reputation: 118118
@ShreevatsaR notes that the algorithm below is known as the greedy algorithm. It does not do very well with certain inputs (I tried 10 different sets of randomly generated sets of inputs of size 100 and in all cases, the sums were very close which led me to think sorting the input was enough for the success of this algorithm).
See also "The Easiest Hard Problem", American Scientist, March-April 2002, recommended by ShreevatsaR.
#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw( sum );
my @numbers = generate_list();
print "@numbers\n\n";
my (@A, @B);
my $N = @numbers;
while ( @numbers ) {
my $n = pop @numbers;
printf "Step: %d\n", $N - @numbers;
{
no warnings 'uninitialized';
if ( sum(@A) < sum(@B) ) {
push @A, $n;
}
else {
push @B, $n;
}
printf "A: %s\n\tsum: %d\n\tnum elements: %d\n",
"@A", sum(@A), scalar @A;
printf "B: %s\n\tsum: %d\n\tnum elements: %d\n\n",
"@B", sum(@B), scalar @B;
}
}
sub generate_list { grep { rand > 0.8 } 1 .. 450 }
Note that generate_list
returns a list in ascending order.
Upvotes: 1
Reputation: 5657
If you need the perfect answer then you have to generate and loop through all of the possible sets of answers. If a pretty good answer is all you need then a technique like simulated annealing is the way to go. Heres some C code that uses a very primitive cooling schedule to find an answer.
#include <stdio.h>
#include <stdlib.h>
#define MAXPAR 50
#define MAXTRIES 10000000
int data1[] = {192,130,446,328,40,174,218,31,59,234,26,365,253,11,198,98,
279,6,276,72,219,15,192,289,289,191,244,62,443,431,363,10
} ;
int data2[] = { 1,2,3,4,5,6,7,8,9 } ;
// What does the set sum to
int sumSet ( int data[], int len )
{
int result = 0 ;
for ( int i=0; i < len; ++i )
result += data[i] ;
return result ;
}
// Print out a set
void printSet ( int data[], int len )
{
for ( int i=0; i < len; ++i )
printf ( "%d ", data[i] ) ;
printf ( " Sums to %d\n", sumSet ( data,len ) ) ;
}
// Partition the values using simulated annealing
void partition ( int data[], size_t len )
{
int set1[MAXPAR] = {0} ; // Parttition 1
int set2[MAXPAR] = {0} ; // Parttition 2
int set1Pos, set2Pos, dataPos, set1Len, set2Len ; // Data about the partitions
int minDiff ; // The best solution found so far
int sum1, sum2, diff ;
int tries = MAXTRIES ; // Don't loop for ever
set1Len = set2Len = -1 ;
dataPos = 0 ;
// Initialize the two partitions
while ( dataPos < len )
{
set1[++set1Len] = data[dataPos++] ;
if ( dataPos < len )
set2[++set2Len] = data[dataPos++] ;
}
// Very primitive simulated annealing solution
sum1 = sumSet ( set1, set1Len ) ;
sum2 = sumSet ( set2, set2Len ) ;
diff = sum1 - sum2 ; // The initial difference - we want to minimize this
minDiff = sum1 + sum2 ;
printf ( "Initial diff is %d\n", diff ) ;
// Loop until a solution is found or all are tries are exhausted
while ( diff != 0 && tries > 0 )
{
// Look for swaps that improves the difference
int newDiff, newSum1, newSum2 ;
set1Pos = rand() % set1Len ;
set2Pos = rand() % set2Len ;
newSum1 = sum1 - set1[set1Pos] + set2[set2Pos] ;
newSum2 = sum2 + set1[set1Pos] - set2[set2Pos] ;
newDiff = newSum1 - newSum2 ;
if ( abs ( newDiff ) < abs ( diff ) || // Is this a better solution?
tries/100 > rand() % MAXTRIES ) // Or shall we just swap anyway - chance of swap decreases as tries reduces
{
int tmp = set1[set1Pos] ;
set1[set1Pos] = set2[set2Pos] ;
set2[set2Pos] = tmp ;
diff = newDiff ;
sum1 = newSum1 ;
sum2 = newSum2 ;
// Print it out if its the best we have seen so far
if ( abs ( diff ) < abs ( minDiff ) )
{
minDiff = diff ;
printSet ( set1, set1Len ) ;
printSet ( set2, set2Len ) ;
printf ( "diff of %d\n\n", abs ( diff ) ) ;
}
}
--tries ;
}
printf ( "done\n" ) ;
}
int main ( int argc, char **argv )
{
// Change this to init rand from the clock say if you don't want the same
// results repoduced evert time!
srand ( 12345 ) ;
partition ( data1, 31 ) ;
partition ( data2, 9 ) ;
return 0;
}
Upvotes: 0
Reputation: 37007
Simulated Annealing can quite quickly find better and better answers. You could keep 1. true while improving the nearness of 2.
Upvotes: 0
Reputation: 14416
I have an algorithm for you. It is using a lot of recursive and iterative concepts.
Assuming you have n number Xn with 1 <= n <= 100 and 1 <= Xn <= 450.
If n < 3 then distribute numbers and stop algorithm,
If n > 2 then sort your list of number in ascending order,
Example:
Assuming: n = 7 and numbers are 10, 75, 30, 45, 25, 15, 20
Pass 1:
Since n > 2 so sort the list : 10, 15, 20, 25, 30, 45, 75
Sum S = 220
A = 220 / ((7-1)/2) = 73
Couples:
10 & 75 => 85
15 & 45 => 60
20 & 30 => 50
Remaining numbers are < 2 so add 25 in the sum list : 85(10,75), 60(15,45), 50(20,30), 25(25)
Pass 2:
n = 4 and numbers are 85, 60, 50, 25
List count is > 2 so sort list : 25(25), 50(20,30), 60(15,45), 85(10,75)
Sum S is still the same (S=220) but A must be recompute : A = 220 / ((4-0)/2) = 110
Couples:
25 & 85 => 110
50 & 60 => 110
The Sum list is : 110(25(25),85(10,75)), 110(50(20,30),60(15,45))
Pass 3:
n = 2 and numbers are 110, 110
n < 3 so distribute numbers:
A = 25, 10, 75
B = 20, 30, 15, 45
This works on each scenario I have tested.
Upvotes: 2
Reputation: 76057
I would give genetic algorithms a try, as this seems a very nice problem to apply them.
The codification is just a binary string of length N, meaning 0 being in the first group and 1 in the second group. Give a negative fitness when the number of elements in each group differs, and a positive fitness when the sums are similar... Something like:
fitness(gen) = (sum(gen)-n/2))^2 + (sum(values[i]*(-1)**gen[i] for i in 0..n))^2
(And minimize the fitness)
Of course this can give you a sub-optimal answer, but for large real world problems it's usually enough.
Upvotes: -1
Reputation: 1084
First, find a solution to the problem without the first constraint (i.e. - making sums as close as possible). This problem can be solved using DP approach (you can read more about DP here, and the first problem - about coins - is very similar to yours).
Once you can solve it, you can add one more state to your DP - the number of persons selected to the subset already. This gives you a N^3 algorithm.
Upvotes: 2
Reputation: 6695
This is a constrained version of the Number Partioning Problem. Usually the goal is to find any 2 disjoint subsets that minimize the difference of the sums. Your problem is constrained in the sense you only consider 1 possiblity: 2 sets of size N/2 (or 1 set of N/2 and one set of N/2+1 if the total number if uneven). This dramatically reduces the search space, but I can't thnk of a good algorithm at the moment, I'll think about it.
Upvotes: 3
Reputation: 118595
Never mind, I thought the numbers were sequential. This looks kind of like the Knapsack Problem, which is NP hard.
The numbers are sequential?
Proof:
After every multiple of 4 numbers has been assigned, A and B both contain the same number of items, and the sum of the items in each group are the same because
(n) + (n - 3) == (n - 1) + (n - 2)
In the last iteration we are at Step 1 above and we have either 0, 1 1, 2 [1,2], or 3 [1,2,3] numbers remaining.
In case 0, we are done and the groups are equal in count and weight.
In case 1, we assign the number 1 to group A. Group A has one more item and one more weight. This is as fair as we can get in this situation.
In case 2, we assign the number 2 to group A and the number 1 to group B. Now the groups have the same number of items and group A has one extra weight. Again, this is as fair as we can get.
In case 3, assign the number 3 to group A, and assign numbers 2 and 1 to group B. Now the groups have the same weight (3 == 2 + 1) and group B has one extra item.
Upvotes: 2
Reputation: 29664
Any dual knapsack algorithm will do (regardless of distribution of numbers).
Upvotes: 0
Reputation: 39009
I assume the numbers are not sequential, and you can't re-balance?
Because of constraint 1, you're going to need to switch buckets every other insertion, always. So every time you're not forced to pick a bucket, pick a logical bucket (where adding the number would make the sum closer to the other bucket). If this bucket isn't the same one as your previous bucket, you get another turn where you're not forced.
Upvotes: 0
Reputation: 1528
your requirement in #2 needs clarification, because: "The sum of all the numbers in A is as nearly equal as possible to the sum of all the numbers in B" is clear, but then your statement "the distribution should be fair" makes everything unclear. What does 'fair' exactly mean? Does the process need a random element in it?
Upvotes: 1
Reputation: 6138
If the numbers are sequential then you just alternate assigning them between A and B.
I suspect they are not, in which case...
Assign the largest unassigned number to the group with the lowest sum unless the difference in size of the the groups is less than or equal to count of unassigned numbers (in which case assign all of the remaining numbers to smaller group).
It won't find the best solution in all cases, but its close and simple.
Upvotes: 2