Luis Eduardo Santos
Luis Eduardo Santos

Reputation: 31

Combination Algorithm from multiple sets

I am trying to write an algorithm that tells me how many pairs I could generate with items coming from multiple set of values. For example I have the following sets: {1,2,3} {4,5} {6}

From these sets I can generate 11 pairs: {1,4}, {1,5}, {1,6}, {2,4}, {2,5}, {2,6}, {3,4}, {3,5}, {3,6}, {4,6}, {5,6}

I wrote the following algorithm:

int result=0;

for(int k=0;k<numberOfSets;k++){ //map is a list where I store all my sets
    int size1 = map.get(k);

    for(int l=k+1;l<numberOfSets;l++){
        int size2 = map.get(l);
        result += size1*size2;
    }
}

But as you can see the algorithm is not very scalable. If the number of sets increases the algorithm starts performing very poorly. Am I missing something?, Is there an algorithm that can help me with this ? I have been looking to combination and permutation algorithms but I am not very sure if thats the right path for this.

Thank you very much in advance

Upvotes: 2

Views: 851

Answers (2)

PartlyCloudy
PartlyCloudy

Reputation: 701

Assuming you only care about the number of pairs, and are counting duplicates, then there is a more efficient algorithm:

We will keep track of the current number of sets, and the number of elements which we encountered so far.

  1. Go over the list from the end to the start
  2. For each new set, the number of new pairs we can make is the size of the set * the size of encountered elements. Add this to the current number of sets.
  3. Add the size of the new set to the number of elements which we encountered so far.

The code:

int numberOfPairs=0;
int elementsEncountered=0;

for(int k = numberOfSets - 1 ; k >= 0 ; k--) { 
   int sizeOfCurrentSet = map.get(k);
   int numberOfNewPairs = sizeOfCurrentSet * elementsEncountered;
   numberOfPairs += numberOfNewPairs;
   elementsEncountered += sizeOfCurrentSet; 
}

The key point to relize is that when we count the number of new pairs that each set contributes, it doesn't matter from which set we select the second element of the pair. That is, we don't need to keep track of any set which we have already analyzed.

Upvotes: 0

Adrian Colomitchi
Adrian Colomitchi

Reputation: 3992

First at all, if the order in the pairs does matter, then starting with int l=k+1 in the inner cycle is erroneous. E.g. you are missing {4,1} if you consider it equal with {1,4}, then the result is correct, otherwise it isn't.

Second, to complicate the matter further, you don't say if the the pairs need to be unique or not. E.g. {1,2} , {2,3}, {4} will generate {2,4} twice - if you need to count it as unique, the result of your code is incorrect (and you will need to keep a Set<Pair<int,int>> to remove the duplicates and you will need to scan those sets and actually generate the pairs).

The good news: while you can't do better than O(N2) just for counting the pairs, even if you have thousands of sets, the millions of integral multiplication/additions are fast enough on nowaday computers - e.g Eigen deals quite well with O(N^3) operations for floating multiplications (see matrix multiplication operations).

Upvotes: 1

Related Questions