Reputation: 10798
This code takes 9 minutes to run for a set of 5,600 objects:
public Set<UnDirectedPair<T>> getAllUndirectedPairs(Set<T> setObjects) {
Set<T> setObjectsProcessed = new TreeSet();
Set<UnDirectedPair<T>> setPairs;
setPairs = new TreeSet();
Iterator<T> setObjectsIteratorA = setObjects.iterator();
Iterator<T> setObjectsIteratorB;
T currTA;
T currTB;
while (setObjectsIteratorA.hasNext()) {
currTA = setObjectsIteratorA.next();
setObjectsProcessed.add(currTA);
setObjectsIteratorB = setObjects.iterator();
while (setObjectsIteratorB.hasNext()) {
currTB = setObjectsIteratorB.next();
if (!setObjectsProcessed.contains(currTB) && !currTA.equals(currTB)) {
setPairs.add(new UnDirectedPair(currTA, currTB));
}
}
setObjectsProcessed.add(currTA);
}
return setPairs;
}
Looking for a way to dramatically reduce the running time... ideas?
[BACKGROUND] The set contains Persons. There are duplicates in the set (same persons, but with slightly different attributes because errors at input time). I have methods which take 2 Persons and make the necessary corrections. So, as a preliminary step, I need to create a Set of Pairs of (Person, Person) which will be fed to these methods.
Upvotes: 0
Views: 987
Reputation: 10798
Thanks for good suggestions.
The basic impairment was my class UnDirectedPair
which had expensive equals
and compareTo
methods. I replaced it with a stripped bare Pair class.
This got the code to run in approx 10s.
Still, using operations on sets seemed costly. With @mawia suggestion modified a bit, sets can be left completely out of the picture. The final code runs in under 2 seconds instead of 9mn 40s - returning a list of 19,471,920 Pair objects!!
public List<Pair<T>> getAllUndirectedPairsAsList(Set<T> setObjects) {
List<T> listObjects = new ArrayList();
listObjects.addAll(setObjects);
List<Pair<T>> listPairs = new ArrayList();
Iterator<T> listIterator1 = listObjects.listIterator();
Iterator<T> listIterator2;
int count = 1;
T object1;
while (listIterator1.hasNext()) {
object1 = listIterator1.next();
listIterator2 = listObjects.listIterator(count++);
while (listIterator2.hasNext()) {
listPairs.add(new Pair(object1, listIterator2.next()));
}
}
return listPairs;
}
Upvotes: 0
Reputation: 9349
One trick I will suggest will be to maintain counter of both outer and inner loop.
int outerCount=0;
while (setObjectsIteratorA.hasNext()) {
currTA = setObjectsIteratorA.next();
setObjectsProcessed.add(currTA);
setObjectsIteratorB = setObjects.iterator();
int innerCount=0;
while (setObjectsIteratorB.hasNext()) {
currTB = setObjectsIteratorB.next();
if (innerCount++>outerCount && !currTA.equals(currTB)) {
setPairs.add(new UnDirectedPair(currTA, currTB));
}
}
outerCount++;
setObjectsProcessed.add(currTA);
}
return setPairs;
This will save last contains,a logN, operation.
The logic behind is:since the two Iterator are on the same set, and sole aim of ObjectProcessedSet is to maintain the record of processed Object, you can achieve the same comparing there index.
Example
Set1={1,1,2,4,5}
Iterator1 iteratorOuter=Set1.Iterator();
int outerCount=0;
while(iteratorOuter.hasNext()){
Iterator2 iteratorInner=Set1.Iterator();
int currA=iteratorOuter.next();
while(iteratorInner.hasNext()){
int CurrB=iteratorInner.next();
//Now here if CurraA=4 and CurrB=2 it is obvious it has been processed
//If currB =5 it is obviously has not been processed.
}
}
Upvotes: 1
Reputation: 11423
One solution which should give you a pretty good speedup is to sort the set first, and then compare only adjacent entries in the set.
Of course this means you need to have a comparable key for each Person
(e.g., it's name), and this key needs to be the same for all duplicates.
Then your code could look something like this:
SortedSet<Person> persons = new TreeSet<>(...);
Person last = null;
for (Person current : persons) {
if (last != null) {
setPairs.add(new UnDirectedPair(last, current));
}
last = current;
}
If Person
does not implement Comparable
(or compares by wrong fields), you can specify a Comparator
when creating the TreeSet
.
This solution runs in O(n*log n), and you have only O(n) pairs to work on afterwards. For only 5600 persons this should be very quick.
You could also make setPairs
a List
in this case to gain some more performance (although very little). Or you don't create the set of pairs at all, and just call your method for correcting Person
objects directly in the loop.
Upvotes: 0