devil0150
devil0150

Reputation: 1469

Erase some of a vector's elements in a for-each loop without iterating the whole vector

I have a vector and I am searching for an element in it while iterating over the vector with a for-each loop. If I find any invalid elements during the search, I would like to remove them from the vector.

Basically, I want to do something like this:

for (auto el : vec) {
    if (el == whatImLookingFor) {
        return el;
    } else if (isInvalid(el)) {
        vec.erase(el);
    }
}

I looked at some other questions like this and this, but both recommend using std::remove_if. That will iterate over the whole vector and remove all invalid elements, instead of iterating only until I find the element I'm looking for, and ignoring any elements after that.

What would be a good way to do this?

Upvotes: 12

Views: 2176

Answers (5)

J. Merdich
J. Merdich

Reputation: 338

I was curious about the performance of this, so I ran a quick naive benchmark comparing Benjamin's find then partial clean and hnefatl's for-loop. Benjamin's is indeed faster: 113x faster. Impressive.

Naive comparison

But I was curious where most of the computation was going, as it was greater than the sum of remove_if and find, which are the only functions that actually iterate through the array. As it turns out, however, the vec.erase line in his code is actually pretty slow. This is because in remove_if he is cleaning the area from the start to the location of the found value, which results in a gap in the middle of the array from the invalid values. vec.erase must then copy the remaining values to fill the gap and finally resize the array.

I ran a third test with a full-sized remove_if/vec.erase followed by a find, so the gap happens at the end and no copy is necessary, just truncation. It turned out to be about 15% faster and leaves the whole vector clean. Note that this assumes that testing for validity is cheap. If it's more than a few simple comparisons, Benjamin's answer will win hands-down, as pointed out by Chris in the comments.

Microptimization benchmark

The code

auto p = std::remove_if(vec.begin(), vec.end(), isInvalid);
vec.erase(p, vec.end());
return std::find(vec.begin(), vec.end(), whatImLookingFor);

The Benchmark

Upvotes: 8

Benjamin Lindley
Benjamin Lindley

Reputation: 103751

You should still use std::remove_if, just call std::find beforehand.

auto el = std::find(vec.begin(), vec.end(), whatImLookingFor);
auto p = std::remove_if(vec.begin(), el, isInvalid);

// returns the iterator, not the element itself.
// if the element is not found, el will be vec.end()
return vec.erase(p, el);

This will usually be more efficient than removing one element at a time.

Upvotes: 20

Chris Drew
Chris Drew

Reputation: 15334

As @BenjaminLindley and @JMerdich pointed out, for the problem stated, a two-pass approach is probably simpler and more efficient.

In a realistic situation, it is possible that there is some expensive calculation that needs to be done either to determine if an element is invalid or to determine if an element is the one we are looking for:

In which case a two-pass approach becomes less desirable because it causes us to do this expensive calculation twice.

But it is possible to do a single-pass approach without calling std::vector::erase multiple times within the loop. It is not too hard to write std::remove_if yourself, then we can get it to do both checks at the same time. We still just call std::vector::erase once at the end:

std::vector<T>::iterator 
findAndRemoveInvalid(std::vector<T>& vec, U whatImLookingFor) {

  // Find first invalid element - or element you are looking for
  auto first = vec.begin();
  for(;first != vec.end(); ++first) {
    auto result = someExpensiveCalculation(*first);
    if (result == whatImLookingFor)
      return first;  
    if (isInvalid(result))
      break;
  }
  if (first == vec.end())
    return first;

  // Find subsequent valid elements - or element you are looking for
  auto it = first + 1;
  for(;it != vec.end(); it++) {
    auto result = someExpensiveCalculation(*it);
    if (result == whatImLookingFor)
      break;
    if (!isInvalid(result)) {
      *first++ = std::move(*it);  // shift valid elements to the start
      continue;
    }
  }

  // erase defunct elements and return iterator to the element
  // you are looking for, or vec.end() if not found.
  return vec.erase(first, it);
}

Live demo.

It is clearly more complicated though, so measure performance first.

Upvotes: 2

hnefatl
hnefatl

Reputation: 6037

This is an intuitive approach to the problem that keeps the looping structure - while it only performs a single pass, due to repeated erasing it's likely to be less efficient than a two-pass approach. For this more efficient approach, you should use Benjamin Lindley's answer.


Modifying the code in the answer to the first link you gave, it looks like something like this would fit your specification:

for (auto i = vec.begin(); i != vec.end();)
{
    if (*i == whatImLookingFor)
        return i;
    else if (isInvalid(*i))
        i = vec.erase(i);       // slow, don't use this version for real
    else
        ++i;
}
  • If we find the element you're searching for, we return an iterator to it.
  • If we find an invalid element along the way (but not after the desired element), we erase it.
  • We avoid iterator invalidation by keeping the iterator returned from erase.

You'll still need to handle the case where the element's not found, probably by returning vec.end().

Upvotes: 7

Ulrich Eckhardt
Ulrich Eckhardt

Reputation: 17444

If a simple loop, exited with break is too primitive for you, I'd suggest using std::find() to get an iterator to the searched element and then to use vector.erase()to delete the othere elements.

Upvotes: 1

Related Questions