Reputation: 1
I'm struggling with a part of my college assignment. I have two subsets of std::set containers containing pointers to quite complex objects, but ordered by different criteria (which is why I can't use std::set_intersection()
). I need to find elements that are contained in both subsets as fast as possible. There is a time/complexity requirement on the assignment.
I can do that in n*log(m)
time where n
is the size of the first subset and m
is the size of the second subset by doing the following:
for(auto it = subset1.begin(), it != subset1.end(), it++){
if(find(subset2.begin(), subset2.end(), *it))
result.insert(*it);
}
This fails the time requirement, which says worst case linear, but average better than linear.
I found the following question here and I find the hashtable approach interesting. However, I fear that the creation of the hashtable might incur too much overhead. The class contained in the sets looks something like this:
class containedInSets {
//methods
private:
vector<string> member1;
SomeObject member2;
int member3;
}
I have no control over the SomeObject class, and therefore cannot specify a hash function for it. I'd have to hash the pointer. Furthermore, the vector may grow quite (in the thousands of entries).
What is the quickest way of doing this?
Upvotes: 0
Views: 413
Reputation: 218278
Your code is not O(n log(m))
but O(n * m)
.
std::find(subset2.begin(), subset2.end(), *it)
is linear, but std::set
has methods find
and count
which are in O(log(n))
(they do a binary search).
So you can simply do:
for (const auto& e : subset1) {
if (subset2.count(e) != 0) {
result.insert(e);
}
}
Which has complexity of n*log(m)
instead of your n * m
.
Upvotes: 3