Reputation:
Obviously doing std::set_intersection()
is a waste of time.
Isn't there a function in the algorithm header for doing exactly this?
std::find_first_of()
is doing a linear search as far as I understand.
Upvotes: 6
Views: 9867
Reputation: 275310
This is a solution only for std::set
(or multi). A solution for map would require only a bit more work.
I try it 3 ways.
First, if one is far larger than the other, I simply look for all of the elements of one in the other. Then vice versa.
The constant 100
is wrong theoretically. It should be k n lg m > m
for some k, not 100 n > m
for optimal big-O performance: but the constant factor is large, and 100>lg m, so really one should experiment.
If that isn't the case, we walk through each collection looking for collisions much like set_intersection
. Instead of just ++
, we use .lower_bound
to try to skip through each list faster.
Note that if your list consists of interleaved elements (like {1,3,7}
and {0,2,4,6,8}
) this will be slower than just ++ by a logarithmic factor.
If the two sets "cross" each other less often, this can skip over large amounts of each set's contents.
Replace the lower_bound
portion with a mere ++
if you want to compare the two behaviors.
template<class Lhs, class Rhs>
bool sorted_has_overlap( Lhs const& lhs, Rhs const& rhs ) {
if (lhs.empty() || rhs.empty()) return false;
if (lhs.size() * 100 < rhs.size()) {
for (auto&& x:lhs)
if (rhs.find(x)!=rhs.end())
return true;
return false;
}
if (rhs.size() * 100 < lhs.size()) {
for(auto&& x:rhs)
if (lhs.find(x)!=lhs.end())
return true;
return false;
}
using std::begin; using std::end;
auto lit = begin(lhs);
auto lend = end(lhs);
auto rit = begin(rhs);
auto rend = end(rhs);
while( lit != lend && rit != rend ) {
if (*lit < *rit) {
lit = lhs.lower_bound(*rit);
continue;
}
if (*rit < *lit) {
rit = rhs.lower_bound(*lit);
continue;
}
return true;
}
return false;
}
a sorted array could do the 3rd choice of algorithm and use std::lower_bound
to do fast advance of the "other" container. This has the advantage of using partial searches (which you cannot do fast in a set
). It will also behave poorly on "interleaved" elements (by a log n factor) compared to naive ++
.
The first two can also be done fast with sorted arrays, replacing method calls with calls to algorithms in std
. Such a transformation is basically mechanical.
An asymptotically optimal version on a sorted array would use a binary search biased towards finding lower bounds at the start of the list -- search at 1, 2, 4, 8, etc instead of at half, quarters, etc. Note that this has the same lg(n) worst case, but is O(1) if the searched for element is first instead of O(lg(n)). As that case (where the search advances less) means less global progress is made, optimizing the sub-algorithm for that case gives you a better global worst-case speed.
To get why, on "fast alternation" it wouldn't perform any worse than ++
-- the case where the next element is a sign swap takes O(1) operations, and it replaces O(k) with O(lg k) if the gap is larger.
However, by this point we are far, far down an optimization hole: profile, and determine if it is worth it before proceeding this way.
Another approach on sorted arrays is to presume that std::lower_bound
is written optimally (on random access iterators). Use an output iterator that throws an exception if written to. Return true iff you catch that exception, false otherwise.
(the optimizations above -- pick one and bin search the other, and exponential advance searching -- may be legal for a std::set_intersection
.)
I think the use of 3 algorithms is important. Set intersection testing where one side is much smaller that the other is probably common: the extreme case of one element on one side, and many on the other is very well known (as a search).
A naive 'double linear' search gives you up to linear performance in that common case. By detecting the assymmetry between sides, you can switch over to 'linear in small, log in large' at an opportune point, and have the much better performance in those cases. O(n+m) vs O(m lg n) -- if m < O(n/lg n) the second beats the first. If m is a constant, then we get O(n) vs O(lg n) -- which includes the edge case of 'use function to find if a single element is in some large collection'.
Upvotes: 7
Reputation: 694
I think you can make a binary_search
#include <set>
#include <iostream>
#include <algorithm>
bool overlap(const std::set<int>& s1, const std::set<int>& s2)
{
for( const auto& i : s1) {
if(std::binary_search(s2.begin(), s2.end(), i))
return true;
}
return false;
}
int main()
{
std::set<int> s1 {1, 2, 3};
std::set<int> s2 {3, 4, 5, 6};
std::cout << overlap(s1, s2) << '\n';
}
Upvotes: 0
Reputation: 1321
Sometimes you can encode sets of numbers in a single memory word. For example, you could encode the set {0,2,3,6,7}
in the memory word: ...00000011001101
. The rule is: the bit at position i
(reading from right to left) is up, if and only if the number i
is in the set.
Now if you have two sets, encoded in the memory words a
and b
, you can perform the intersection using the bitwise operator &
.
int a = ...;
int b = ...;
int intersection = a & b;
int union = a | b; // bonus
The good thing of this style, is that the intersection ( union, complementation ) is performed in one cpu instruction (I don't know if this is the correct term).
You could use more than one memory word, if you need to handle numbers that are greater than the number of bits of a memory word. Normally, I use an array of memory words.
If you want handle negative numbers, just use two arrays, one for negative numbers, and one for positive numbers.
The bad thing of this method, is that it works only with integers.
Upvotes: 5
Reputation: 20015
You can use the following template function if the inputs are sorted:
template<class InputIt1, class InputIt2>
bool intersect(InputIt1 first1, InputIt1 last1, InputIt2 first2, InputIt2 last2)
{
while (first1 != last1 && first2 != last2) {
if (*first1 < *first2) {
++first1;
continue;
}
if (*first2 < *first1) {
++first2;
continue;
}
return true;
}
return false;
}
You can use like this:
#include <iostream>
int main() {
int a[] = {1, 2, 3};
int b[] = {3, 4};
int c[] = {4};
std::cout << intersect(a, a + 3, b, b + 2) << std::endl;
std::cout << intersect(b, b + 2, c, c + 1) << std::endl;
std::cout << intersect(a, a + 3, c, c + 1) << std::endl;
}
Result:
1
1
0
This function has complexity O(n + m)
where n
, m
are input sizes. But if one input is very small to the other (n << m for example), it's better to check each of the n
elements with binary search if it belongs to the other input. This gives O(n * log(m))
time.
#include <algorithm>
template<class InputIt1, class InputIt2>
/**
* When input1 is much smaller that input2
*/
bool intersect(InputIt1 first1, InputIt1 last1, InputIt2 first2, InputIt2 last2) {
while (first1 != last1)
if (std::binary_search(first2, last2, *first1++))
return true;
return false;
}
Upvotes: 6