MChrudina
MChrudina

Reputation: 1

How to optimize a std::set intersection algorithm (C++)

I'm struggling with a part of my college assignment. I have two subsets of std::set containers containing pointers to quite complex objects, but ordered by different criteria (which is why I can't use std::set_intersection()). I need to find elements that are contained in both subsets as fast as possible. There is a time/complexity requirement on the assignment.

I can do that in n*log(m) time where n is the size of the first subset and m is the size of the second subset by doing the following:

for(auto it = subset1.begin(), it != subset1.end(), it++){
    if(find(subset2.begin(), subset2.end(), *it))
        result.insert(*it);
}

This fails the time requirement, which says worst case linear, but average better than linear.

I found the following question here and I find the hashtable approach interesting. However, I fear that the creation of the hashtable might incur too much overhead. The class contained in the sets looks something like this:

class containedInSets {
   //methods
private: 
    vector<string> member1;
    SomeObject member2;
    int member3;
}

I have no control over the SomeObject class, and therefore cannot specify a hash function for it. I'd have to hash the pointer. Furthermore, the vector may grow quite (in the thousands of entries).

What is the quickest way of doing this?

Upvotes: 0

Views: 413

Answers (1)

Jarod42
Jarod42

Reputation: 218278

Your code is not O(n log(m)) but O(n * m).

std::find(subset2.begin(), subset2.end(), *it) is linear, but std::set has methods find and count which are in O(log(n)) (they do a binary search).

So you can simply do:

for (const auto& e : subset1) {
    if (subset2.count(e) != 0) {
        result.insert(e);
    }
}

Which has complexity of n*log(m) instead of your n * m.

Upvotes: 3

Related Questions