Darkenor
Darkenor

Reputation: 4449

How can I avoid "for" loops with an "if" condition inside them with C++?

With almost all code I write, I am often dealing with set reduction problems on collections that ultimately end up with naive "if" conditions inside of them. Here's a simple example:

for(int i=0; i<myCollection.size(); i++)
{
     if (myCollection[i] == SOMETHING)
     {
           DoStuff();
     }
}

With functional languages, I can solve the problem by reducing the collection to another collection (easily) and then perform all operations on my reduced set. In pseudocode:

newCollection <- myCollection where <x=true
map DoStuff newCollection

And in other C variants, like C#, I could reduce with a where clause like

foreach (var x in myCollection.Where(c=> c == SOMETHING)) 
{
   DoStuff();
}

Or better (at least to my eyes)

myCollection.Where(c=>c == Something).ToList().ForEach(d=> DoStuff(d));

Admittedly, I am doing a lot of paradigm mixing and subjective/opinion based style, but I can't help but feel that I am missing something really fundamental that could allow me to use this preferred technique with C++. Could someone enlighten me?

Upvotes: 117

Views: 26574

Answers (13)

mathreadler
mathreadler

Reputation: 501

If DoStuff() would be dependent on i somehow in the future then I'd propose this guaranteed branch-free bit-masking variant.

unsigned int times = 0;
const int kSize = sizeof(unsigned int)*8;
for(int i = 0; i < myCollection.size()/kSize; i++){
  unsigned int mask = 0;
  for (int j = 0; j<kSize; j++){
    mask |= (myCollection[i*kSize+j]==SOMETHING) << j;
  }
  times+=popcount(mask);
}

for(int i=0;i<times;i++)
   DoStuff();

Where popcount is any function doing a population count ( count number of bits = 1 ). There will be some freedom to put more advanced constraints with i and their neighbors. If that is not needed we can strip the inner loop and remake the outer loop

for(int i = 0; i < myCollection.size(); i++)
  times += (myCollection[i]==SOMETHING);

followed by a

for(int i=0;i<times;i++)
   DoStuff();

Sorry for necroing this thread, but I realized just now that my method does not explain how to store information about which "i" to use and which to skip. This can be solved with an array of pointers to objects with operator() and a "dummy object" which overloads operator() to do nothing. We insert the pointer to the dummy object in all the places where the logical condition evaluates to 0 and the pointer to the actual object wherever it evaluates to 1. If we are not dependent on some silly MISRA compliancy this can be achieved easily and neatly with arithmetics as a binary linear combination on pointers.

Another solution without dummy objects or null pointers (also avoiding possibly silly MISRA trouble) is to simply use the logical condition as a count-up variable to an index pointing into an array and just put the pointers there. This way the ones which fail the logical test will be overwritten by the ones which succeed but not the other way around.

Upvotes: 7

einpoklum
einpoklum

Reputation: 131435

One can describe your code pattern as applying some function to a subset of a range, or in other words: applying it to the result of applying a filter to the whole range.

This is achievable in the most straightforward manner with Eric Neibler's ranges-v3 library; although it's a bit of an eyesore, because you want to work with indices:

using namespace ranges;
auto mycollection_has_something = 
    [&](std::size_t i) { return myCollection[i] == SOMETHING };
auto filtered_view = 
    views::iota(std::size_t{0}, myCollection.size()) | 
    views::filter(mycollection_has_something);
for (auto i : filtered_view) { DoStuff(); }

But if you're willing to forego indices, you'd get:

auto is_something = [&SOMETHING](const decltype(SOMETHING)& x) { return x == SOMETHING };
auto filtered_collection = myCollection | views::filter(is_something);
for (const auto& x : filtered_collection) { DoStuff(); }

which is nicer IMHO.

PS - The ranges library is mostly going into the C++ standard in C++20.

Upvotes: 1

v.oddou
v.oddou

Reputation: 6775

I'll just mention Mike Acton, he would definitely say:

If you have to do that, you have a problem with your data. Sort your data!

Upvotes: 0

mathreadler
mathreadler

Reputation: 501

Another solution in case the i:s are important. This one builds a list that fills in the indexes of which to call doStuff() for. Once again the main point is to avoid the branching and trade it for pipelineable arithmetic costs.

int buffer[someSafeSize];
int cnt = 0; // counter to keep track where we are in list.
for( int i = 0; i < container.size(); i++ ){
   int lDecision = (container[i] == SOMETHING);
   buffer[cnt] = lDecision*i + (1-lDecision)*buffer[cnt];
   cnt += lDecision;
}

for( int i=0; i<cnt; i++ )
   doStuff(buffer[i]); // now we could pass the index or a pointer as an argument.

The "magical" line is the buffer loading line that arithmetically calculates wether to keep the value and stay in position or to count up position and add value. So we trade away a potential branch for some logics and arithmetics and maybe some cache hits. A typical scenario when this would be useful is if doStuff() does a small amount of pipelineable calculations and any branch in between calls could interrupt those pipelines.

Then just loop over the buffer and run doStuff() until we reach cnt. This time we will have the current i stored in the buffer so we can use it in the call to doStuff() if we would need to.

Upvotes: 2

Simon Richter
Simon Richter

Reputation: 29586

The idea of avoiding

for(...)
    if(...)

constructs as an antipattern is too broad.

It is completely fine to process multiple items that match a certain expression from inside a loop, and the code cannot get much clearer than that. If the processing grows too large to fit on screen, that is a good reason to use a subroutine, but still the conditional is best placed inside the loop, i.e.

for(...)
    if(...)
        do_process(...);

is vastly preferable to

for(...)
    maybe_process(...);

It becomes an antipattern when only one element will match, because then it would be clearer to first search for the element, and perform the processing outside of the loop.

for(int i = 0; i < size; ++i)
    if(i == 5)

is an extreme and obvious example of this. More subtle, and thus more common, is a factory pattern like

for(creator &c : creators)
    if(c.name == requested_name)
    {
        unique_ptr<object> obj = c.create_object();
        obj.owner = this;
        return std::move(obj);
    }

This is hard to read, because it isn't obvious that the body code will be executed once only. In this case, it would be better to separate the lookup:

creator &lookup(string const &requested_name)
{
    for(creator &c : creators)
        if(c.name == requested_name)
            return c;
}

creator &c = lookup(requested_name);
unique_ptr obj = c.create_object();

There is still an if within a for, but from the context it becomes clear what it does, there is no need to change this code unless the lookup changes (e.g. to a map), and it is immediately clear that create_object() is called only once, because it is not inside a loop.

Upvotes: 21

Loreto
Loreto

Reputation: 694

Also, if you don't care reordering the collection, std::partition is cheap.

#include <iostream>
#include <vector>
#include <algorithm>
#include <functional>

void DoStuff(int i)
{
    std::cout << i << '\n';
}

int main()
{
    using namespace std::placeholders;

    std::vector<int> v {1, 2, 5, 0, 9, 5, 5};
    const int SOMETHING = 5;

    std::for_each(v.begin(),
                  std::partition(v.begin(), v.end(),
                                 std::bind(std::equal_to<int> {}, _1, SOMETHING)), // some condition
                  DoStuff); // action
}

Upvotes: 6

Ian Parberry
Ian Parberry

Reputation: 77

I am in awe of the complexity of the above solutions. I was going to suggest a simple #define foreach(a,b,c,d) for(a; b; c)if(d) but it has a few obvious deficits, for example, you have to remember to use commas instead of semicolons in your loop, and you can't use the comma operator in a or c.

#include <list>
#include <iostream>

using namespace std; 

#define foreach(a,b,c,d) for(a; b; c)if(d)

int main(){
  list<int> a;

  for(int i=0; i<10; i++)
    a.push_back(i);

  for(auto i=a.begin(); i!=a.end(); i++)
    if((*i)&1)
      cout << *i << ' ';
  cout << endl;

  foreach(auto i=a.begin(), i!=a.end(), i++, (*i)&1)
    cout << *i << ' ';
  cout << endl;

  return 0;
}

Upvotes: 5

lorro
lorro

Reputation: 10880

Boost provides ranges that can be used w/ range-based for. Ranges have the advantage that they don't copy the underlying data structure, they merely provide a 'view' (that is, begin(), end() for the range and operator++(), operator==() for the iterator). This might be of your interest: http://www.boost.org/libs/range/doc/html/range/reference/adaptors/reference/filtered.html

#include <boost/range/adaptor/filtered.hpp>
#include <iostream>
#include <vector>

struct is_even
{
    bool operator()( int x ) const { return x % 2 == 0; }
};

int main(int argc, const char* argv[])
{
    using namespace boost::adaptors;

    std::vector<int> myCollection{1,2,3,4,5,6,7,8,9};

    for( int i: myCollection | filtered( is_even() ) )
    {
        std::cout << i;
    }
}

Upvotes: 49

bipll
bipll

Reputation: 11940

for(auto const &x: myCollection) if(x == something) doStuff();

Looks pretty much like a C++-specific for comprehension to me. To you?

Upvotes: 11

user743382
user743382

Reputation:

One style that gets used enough to mention, but hasn't been mentioned yet, is:

for(int i=0; i<myCollection.size(); i++) {
  if (myCollection[i] != SOMETHING)
    continue;

  DoStuff();
}

Advantages:

  • Doesn't change the indentation level of DoStuff(); when condition complexity increases. Logically, DoStuff(); should be at the top-level of the for loop, and it is.
  • Immediately makes it clear that the loop iterates over the SOMETHINGs of the collection, without requiring the reader to verify that there is nothing after the closing } of the if block.
  • Doesn't require any libraries or helper macros or functions.

Disadvantages:

  • continue, like other flow control statements, gets misused in ways that lead to hard-to-follow code so much that some people are opposed to any use of them: there is a valid style of coding that some follow that avoids continue, that avoids break other than in a switch, that avoids return other than at the end of a function.

Upvotes: 15

Yakk - Adam Nevraumont
Yakk - Adam Nevraumont

Reputation: 275230

Here is a quick relatively minimal filter function.

It takes a predicate. It returns a function object that takes an iterable.

It returns an iterable that can be used in a for(:) loop.

template<class It>
struct range_t {
  It b, e;
  It begin() const { return b; }
  It end() const { return e; }
  bool empty() const { return begin()==end(); }
};
template<class It>
range_t<It> range( It b, It e ) { return {std::move(b), std::move(e)}; }

template<class It, class F>
struct filter_helper:range_t<It> {
  F f;
  void advance() {
    while(true) {
      (range_t<It>&)*this = range( std::next(this->begin()), this->end() );
      if (this->empty())
        return;
      if (f(*this->begin()))
        return;
    }
  }
  filter_helper(range_t<It> r, F fin):
    range_t<It>(r), f(std::move(fin))
  {
      while(true)
      {
          if (this->empty()) return;
          if (f(*this->begin())) return;
          (range_t<It>&)*this = range( std::next(this->begin()), this->end() );
      }
  }
};

template<class It, class F>
struct filter_psuedo_iterator {
  using iterator_category=std::input_iterator_tag;
  filter_helper<It, F>* helper = nullptr;
  bool m_is_end = true;
  bool is_end() const {
    return m_is_end || !helper || helper->empty();
  }

  void operator++() {
    helper->advance();
  }
  typename std::iterator_traits<It>::reference
  operator*() const {
    return *(helper->begin());
  }
  It base() const {
      if (!helper) return {};
      if (is_end()) return helper->end();
      return helper->begin();
  }
  friend bool operator==(filter_psuedo_iterator const& lhs, filter_psuedo_iterator const& rhs) {
    if (lhs.is_end() && rhs.is_end()) return true;
    if (lhs.is_end() || rhs.is_end()) return false;
    return lhs.helper->begin() == rhs.helper->begin();
  }
  friend bool operator!=(filter_psuedo_iterator const& lhs, filter_psuedo_iterator const& rhs) {
    return !(lhs==rhs);
  }
};
template<class It, class F>
struct filter_range:
  private filter_helper<It, F>,
  range_t<filter_psuedo_iterator<It, F>>
{
  using helper=filter_helper<It, F>;
  using range=range_t<filter_psuedo_iterator<It, F>>;

  using range::begin; using range::end; using range::empty;

  filter_range( range_t<It> r, F f ):
    helper{{r}, std::forward<F>(f)},
    range{ {this, false}, {this, true} }
  {}
};

template<class F>
auto filter( F&& f ) {
    return [f=std::forward<F>(f)](auto&& r)
    {
        using std::begin; using std::end;
        using iterator = decltype(begin(r));
        return filter_range<iterator, std::decay_t<decltype(f)>>{
            range(begin(r), end(r)), f
        };
    };
};

I took short cuts. A real library should make real iterators, not the for(:)-qualifying pseudo-fascades I did.

At point of use, it looks like this:

int main()
{
  std::vector<int> test = {1,2,3,4,5};
  for( auto i: filter([](auto x){return x%2;})( test ) )
    std::cout << i << '\n';
}

which is pretty nice, and prints

1
3
5

Live example.

There is a proposed addition to C++ called Rangesv3 which does this kind of thing and more. boost also has filter ranges/iterators available. boost also has helpers that make writing the above much shorter.

Upvotes: 17

Jonathan Wakely
Jonathan Wakely

Reputation: 171263

Instead of creating a new algorithm, as the accepted answer does, you can use an existing one with a function that applies the condition:

std::for_each(first, last, [](auto&& x){ if (cond(x)) { ... } });

Or if you really want a new algorithm, at least reuse for_each there instead of duplicating the iteration logic:

template<typename Iter, typename Pred, typename Op> 
  void
  for_each_if(Iter first, Iter last, Pred p, Op op) {
    std::for_each(first, last, [&](auto& x) { if (p(x)) op(x); });
  }

Upvotes: 46

Dimitrios Bouzas
Dimitrios Bouzas

Reputation: 42889

IMHO it's more straight forward and more readable to use a for loop with an if inside it. However, if this is annoying for you, you could use a for_each_if like the one below:

template<typename Iter, typename Pred, typename Op> 
void for_each_if(Iter first, Iter last, Pred p, Op op) {
  while(first != last) {
    if (p(*first)) op(*first);
    ++first;
  }
}

Usecase:

std::vector<int> v {10, 2, 10, 3};
for_each_if(v.begin(), v.end(), [](int i){ return i > 5; }, [](int &i){ ++i; });

Live Demo

Upvotes: 100

Related Questions