seinecle
seinecle

Reputation: 10798

optimization: double loop through a set in java

This code takes 9 minutes to run for a set of 5,600 objects:

public Set<UnDirectedPair<T>> getAllUndirectedPairs(Set<T> setObjects) {
    Set<T> setObjectsProcessed = new TreeSet();
    Set<UnDirectedPair<T>> setPairs;
    setPairs = new TreeSet();
    Iterator<T> setObjectsIteratorA = setObjects.iterator();
    Iterator<T> setObjectsIteratorB;
    T currTA;
    T currTB;
    while (setObjectsIteratorA.hasNext()) {
        currTA = setObjectsIteratorA.next();
        setObjectsProcessed.add(currTA);
        setObjectsIteratorB = setObjects.iterator();
        while (setObjectsIteratorB.hasNext()) {
            currTB = setObjectsIteratorB.next();
            if (!setObjectsProcessed.contains(currTB) && !currTA.equals(currTB)) {
                setPairs.add(new UnDirectedPair(currTA, currTB));
            }
        }
        setObjectsProcessed.add(currTA);
    }
    return setPairs;

}

Looking for a way to dramatically reduce the running time... ideas?

[BACKGROUND] The set contains Persons. There are duplicates in the set (same persons, but with slightly different attributes because errors at input time). I have methods which take 2 Persons and make the necessary corrections. So, as a preliminary step, I need to create a Set of Pairs of (Person, Person) which will be fed to these methods.

Upvotes: 0

Views: 987

Answers (3)

seinecle
seinecle

Reputation: 10798

Thanks for good suggestions.

The basic impairment was my class UnDirectedPair which had expensive equals and compareTo methods. I replaced it with a stripped bare Pair class. This got the code to run in approx 10s.

Still, using operations on sets seemed costly. With @mawia suggestion modified a bit, sets can be left completely out of the picture. The final code runs in under 2 seconds instead of 9mn 40s - returning a list of 19,471,920 Pair objects!!

public List<Pair<T>> getAllUndirectedPairsAsList(Set<T> setObjects) {
    List<T> listObjects = new ArrayList();
    listObjects.addAll(setObjects);

    List<Pair<T>> listPairs = new ArrayList();
    Iterator<T> listIterator1 = listObjects.listIterator();
    Iterator<T> listIterator2;
    int count = 1;
    T object1;
    while (listIterator1.hasNext()) {
        object1 = listIterator1.next();
        listIterator2 = listObjects.listIterator(count++);
        while (listIterator2.hasNext()) {
            listPairs.add(new Pair(object1, listIterator2.next()));
        }
    }
    return listPairs;
}

Upvotes: 0

mawia
mawia

Reputation: 9349

One trick I will suggest will be to maintain counter of both outer and inner loop.

int outerCount=0;
while (setObjectsIteratorA.hasNext()) {
    currTA = setObjectsIteratorA.next();
    setObjectsProcessed.add(currTA);
    setObjectsIteratorB = setObjects.iterator();
    int innerCount=0;
    while (setObjectsIteratorB.hasNext()) {
        currTB = setObjectsIteratorB.next();
        if (innerCount++>outerCount && !currTA.equals(currTB)) {
            setPairs.add(new UnDirectedPair(currTA, currTB));
        }
    }
 outerCount++;
    setObjectsProcessed.add(currTA);
}
return setPairs;

This will save last contains,a logN, operation.

The logic behind is:since the two Iterator are on the same set, and sole aim of ObjectProcessedSet is to maintain the record of processed Object, you can achieve the same comparing there index.

Example

  Set1={1,1,2,4,5}
  Iterator1 iteratorOuter=Set1.Iterator();


  int outerCount=0;
  while(iteratorOuter.hasNext()){
           Iterator2 iteratorInner=Set1.Iterator();
           int currA=iteratorOuter.next();
      while(iteratorInner.hasNext()){
           int CurrB=iteratorInner.next();
           //Now here if CurraA=4 and CurrB=2 it is obvious it has been processed
          //If currB =5 it is obviously has not been processed.
     }
  }

Upvotes: 1

Philipp Wendler
Philipp Wendler

Reputation: 11423

One solution which should give you a pretty good speedup is to sort the set first, and then compare only adjacent entries in the set.

Of course this means you need to have a comparable key for each Person (e.g., it's name), and this key needs to be the same for all duplicates.

Then your code could look something like this:

SortedSet<Person> persons = new TreeSet<>(...);
Person last = null;
for (Person current : persons) {
  if (last != null) {
    setPairs.add(new UnDirectedPair(last, current));
  }
  last = current;
}

If Person does not implement Comparable (or compares by wrong fields), you can specify a Comparator when creating the TreeSet.

This solution runs in O(n*log n), and you have only O(n) pairs to work on afterwards. For only 5600 persons this should be very quick.

You could also make setPairs a List in this case to gain some more performance (although very little). Or you don't create the set of pairs at all, and just call your method for correcting Person objects directly in the loop.

Upvotes: 0

Related Questions