Reputation: 1423
I have 3 hashsets. goodLinkSet, badLinkSet and testLinkSet.
goodLinkSet holds a list of URLs that work and badLinkSet holds a list of URLs that don't work. testLinkSet holds a list of URLs that I need to check if they are good are bad, some of the links in here have been tested already in the other two sets.
What I want to do is remove all the strings/links in testLinkSet that appear in goodLinkSet and badLinkSet so I'm not testing URLs multiple times. I want to do this as efficiently and as fast as possible. A for each loop seems to be a bit slow.
What's the most efficient way of running this? Are there any functions that does this for me? Any advice would be very much appreciated!
Upvotes: 0
Views: 631
Reputation: 30301
You should check at insertion time:
boolean addToTestLinkSet(String str) {
if (goodLinkSet.contains(str) || badLinkSet.contains(str))
return false;
testLinkSet.add(str);
return true;
}
contains()
on HashSet
s is O(1), so the overhead should be pretty low.
The solution is pretty similar to Peter's but has the added bonus of using less memory (because it will avoid temporarily storing useless entries in testLinkSet
).
Additionally, if you know that badLinkSet.size() > goodLinkSet.size()
, you may even swap the order in which the two sets are tested.
Upvotes: 1
Reputation: 533530
What I want to do is remove all the strings/links in testLinkSet that appear in goodLinkSet and badLinkSet so I'm not testing URLs multiple times.
The most efficient way is to not remove the entries but to test them as you need to.
for(URL url: testLinkSet) {
if(goodLinkSet.conatins(url) || badListSet.conatins(url)) continue;
// test url
}
This does far less work as it does the same amount of tests, but avoid modifying anything.
Upvotes: 3
Reputation: 328629
What I want to do is remove all the strings/links in testLinkSet that appear in goodLinkSet and badLinkSet so I'm not testing URLs multiple times.
testLinkSet.removeAll(goodLinkSet);
testLinkSet.removeAll(badLinkSet);
This will run a loop internally, but unless you have (many) millions of links you won't have time to count to 1 before it completes.
If you need better performance, you should track each individual link and remove/add them as and when they get tested.
Upvotes: 6