Steg Verner
Steg Verner

Reputation: 913

intersecting vectors in c++

I have a vector<vector<int> > A; of size 44,000. Now I need to intersect 'A' with another vector: vector<int> B of size 400,000. The size of inner vectors of A i.e. vector is variable and is of maximum size of 9,000 elements for doing the same I am using the following code:

for(int i=0;i<44000;i++)
  vector<int> intersect;
  set_intersection(A[i].begin(),A[i].end(),B.begin(),B.end(),
                std::back_inserter(intersect));

Is there some way by which I may make the code efficient. All the elements in vector A are sorted i.e. they are of the form ((0,1,4,5),(7,94,830,1000)), etc. That is, all elements of A[i]'s vector < all elements of A[j]'s vector if i<j.

EDIT: One of the solutions which I thought about is to merge all the A[i]'s together into another vector mergedB using:

  vector<int> mergedB;
  for(int i=0;i<44000;i++)
       mergedB.insert(mergedB.begin(),mergedB.end(),A[i])
  vector<int> intersect;
  set_intersection(mergedB.begin(),mergedB.end(),B.begin(),B.end(),
                std::back_inserter(intersect));

However, I am not getting the reason as to why am I getting almost same performance with both the codes. Can someone please help me understand this

Upvotes: 1

Views: 256

Answers (2)

doron
doron

Reputation: 28882

Since your vectors are sorted, the simplest and fastest algorithm will be to

  1. Set the current element of both vectors to the first value
  2. Compare the both current elements. If equal you have an interection, so increment both vectors'
  3. If not equal increment the vector with the smallest current element.
  4. Goto 2.

Upvotes: 0

Yakk - Adam Nevraumont
Yakk - Adam Nevraumont

Reputation: 275385

As it happens, set_itersection is easy to write.

A fancy way would be to create a concatenating iterator, and go over each element of the lhs vector. But it is easier to write set_intersection manually.

template<class MetaIt, class FilterIt, class Sink>
void meta_intersect(MetaIt mb, MetaIt me, FilterIt b, FilterIt e, Sink sink) {
  using std::begin; using std::end;
  if (b==e) return;
  while (mb != me) {
    auto b2 = begin(*mb);
    auto e2 = end(*mb);
    if (b2==e2) {
      ++mb;
      continue;
    }
    do {
      if (*b2 < *b) {
        ++b2;
        continue;
      }
      if (*b < *b2) {
        ++b;
        if (b==e) return;
        continue;
      }
      *sink = *b2;
      ++sink; ++b; ++b2;
      if (b==e) return;
    } while (b2 != e2);
    ++mb;
  }
}

this does not copy elements, other than into the output vector. It assumes MetaIt is an iterator to containers, FilterIt is an iterator to a compatible container, and Sink is an output iterator.

I attempted to remove all redundant comparisons while keeping the code somewhat readable. There is one redundant check -- we check b!=e and then b==e in the single case where we run out of rhs contents. As this should only happen once, the cost to clarity isn't worth it.

You could possibly make the above more efficient with vectorization on modern hardware. I'm not an expert at that. Mixing vectorization with the meta-iteration is tricky.

Upvotes: 1

Related Questions