Peck3277
Peck3277

Reputation: 1423

Java: Comparing hashsets as efficently as possible

I have 3 hashsets. goodLinkSet, badLinkSet and testLinkSet.

goodLinkSet holds a list of URLs that work and badLinkSet holds a list of URLs that don't work. testLinkSet holds a list of URLs that I need to check if they are good are bad, some of the links in here have been tested already in the other two sets.

What I want to do is remove all the strings/links in testLinkSet that appear in goodLinkSet and badLinkSet so I'm not testing URLs multiple times. I want to do this as efficiently and as fast as possible. A for each loop seems to be a bit slow.

What's the most efficient way of running this? Are there any functions that does this for me? Any advice would be very much appreciated!

Upvotes: 0

Views: 631

Answers (3)

CAFxX
CAFxX

Reputation: 30301

You should check at insertion time:

boolean addToTestLinkSet(String str) {
  if (goodLinkSet.contains(str) || badLinkSet.contains(str))
    return false;
  testLinkSet.add(str);
  return true;
}

contains() on HashSets is O(1), so the overhead should be pretty low.

The solution is pretty similar to Peter's but has the added bonus of using less memory (because it will avoid temporarily storing useless entries in testLinkSet).

Additionally, if you know that badLinkSet.size() > goodLinkSet.size(), you may even swap the order in which the two sets are tested.

Upvotes: 1

Peter Lawrey
Peter Lawrey

Reputation: 533530

What I want to do is remove all the strings/links in testLinkSet that appear in goodLinkSet and badLinkSet so I'm not testing URLs multiple times.

The most efficient way is to not remove the entries but to test them as you need to.

for(URL url: testLinkSet) {
    if(goodLinkSet.conatins(url) || badListSet.conatins(url)) continue;

    // test url
}

This does far less work as it does the same amount of tests, but avoid modifying anything.

Upvotes: 3

assylias
assylias

Reputation: 328629

What I want to do is remove all the strings/links in testLinkSet that appear in goodLinkSet and badLinkSet so I'm not testing URLs multiple times.

testLinkSet.removeAll(goodLinkSet);
testLinkSet.removeAll(badLinkSet);

This will run a loop internally, but unless you have (many) millions of links you won't have time to count to 1 before it completes.

If you need better performance, you should track each individual link and remove/add them as and when they get tested.

Upvotes: 6

Related Questions