NoSenseEtAl
NoSenseEtAl

Reputation: 30028

elegant way to remove all elements of a vector that are contained in another vector?

While looking over some code I found loopy and algorithmically slow implementation of std::set_difference :

 for(int i = 0; i < a.size(); i++)
 {
  iter = std::find(b.begin(),b.end(),a[i]);
  if(iter != b.end())
  {
     b.erase(iter);
  }
 }

It can be easily replaced with sort(vectors are not sorted) + set_difference, but that requires allocation of new memory(see my recent Q Can output of set difference be stored in first input? why it cant be done "inplace").
So my solution would be something like:

sort(a.begin(), a.end());
for(size_t i = 0; i < b.size(); i++)
{
 if (binary_search(a.begin(), a.end(), b[i]))
 {
     swap(b[i], b[b.size()-1]); //remove current element by swapping with last
     b.pop_back();     // and removing new last by shrinking
 }
}

can it be done more elegantly?
elegant is subjective so within scope of this Q is defined as clearer code(ideally something from STL algorithms but I think it cant be done) but with no memory allocation and no increase in alg complexity.

Upvotes: 13

Views: 18276

Answers (5)

NoSenseEtAl
NoSenseEtAl

Reputation: 30028

after thinking for a while I thought of this
(note:by answering my own Q im not claiming this is a superior to offered A):

vector<int64_t> a{3,2,7,5,11,13}, b{2,3,13,5};
set<int64_t> bs(b.begin(), b.end());
for (const auto& num: bs)
    cout << num << " ";
cout  << endl;
for (const auto& num: a)
    bs.erase(num);
vector<int64_t> result(bs.begin(), bs.end());
for (const auto& num: result)
    cout << num << " ";

Upvotes: 1

Ralph Tandetzky
Ralph Tandetzky

Reputation: 23610

Sort b so you can binary search it in order to reduce time complexity. Then use the erase-remove idiom in order to throw away all elements from a that are contained in b:

sort( begin(b), end(b) );
a.erase( remove_if( begin(a),end(a),
    [&](auto x){return binary_search(begin(b),end(b),x);}), end(a) );

Of course, you can still sacrifice time complexity for simplicity and reduce your code by removing the sort() and replacing binary_search() by find():

a.erase( remove_if( begin(a),end(a),
    [&](auto x){return find(begin(b),end(b),x)!=end(b);}), end(a) );

This is a matter of taste. In both cases you don't need heap allocations. By the way, I'm using lambda auto parameters which are C++14. Some compilers already implement that feature such as clang. If you don't have such a compiler, but only C++11 then replace auto by the element type of the container.

By the way, this code does not mention any types! You can write a template function so it works for all kind of types. The first variant requires random access iteration of b while the second piece of code does not require that.

Upvotes: 10

n. m. could be an AI
n. m. could be an AI

Reputation: 119877

This one does it in O(N+M), assuming both arrays are sorted.

  auto ib = std::begin(two);
  auto iter = std::remove_if (
       std::begin(one), std::end(one),
       [&ib](int x) -> bool {
                       while  (ib != std::end(two) && *ib < x) ++ib;
                       return (ib != std::end(two) && *ib == x);
                     });

Upvotes: 10

Adrian McCarthy
Adrian McCarthy

Reputation: 47962

The current code is quite clear, in that it should be obvious to any programmer what's going on.

The current performance is O(a.size() * b.size()), which may be pretty bad depending upon the actual sizes.

A more concise and STL-like way to describe it is to use remove_if with a predicate that tells you if a value in in a.

b.erase(std::remove_if(b.begin(), b.end(), [](const auto&x) {
  return std::find(a.begin(), a.end(), x) != a.end();
}), b.end());

(Not tested, so I might have made a syntax error.) I used a lambda, but you can create a functor if you're not using a C++11 compiler.

Note that the original code removes just one instance of a value in b that's also in a. My solution will remove all instances of such a value from b.

Note that the find operation happens again and again, so it's probably better to do that on the smaller vector for better locality of reference.

Upvotes: 1

bstamour
bstamour

Reputation: 7766

One solution that comes to mind is combining remove_if and binary_search. It's effectively the same as your manual looping solution but might be a bit more "elegant" as it uses more STL features.

sort(begin(b), end(b));
auto iter = remove_if(begin(a), end(a), 
                      [](auto x) { 
                          return binary_search(begin(b), end(b), x); 
                      });
// Now [begin(a), iter) defines a new range, and you can erase them however
// you see fit, based on the type of a.

Upvotes: 4

Related Questions