bartek
bartek

Reputation: 525

Divide container into chunks, C++

Given a vector of N elements v = ( 1, 2, 3, 4, ... , N ) return range iterator over all chunks of size K<N. The last range can be smaller than K if N%K!=0.

For example:

v = ("a","b","c","d","e")

display strings

"ab", "cd", "e"

N=v.size();
K=2;

One possible solution is:

for( unsigned int i=0; i<v.size(); i+=K )
    cout << boost::join( v | boost::adaptors::sliced( i, min(i+K, v.size()) ), "" );

This solution is quite ok but it has several problems:

  1. for loop - is it needed?
  2. if you write i+K instead of min(i+K, v.size()) algorithm crushes, one needs to pay additional attention to boundary case. This looks ugly and distracts.

Can you propose more elegant solution? By elegant solution I mean the use of a general algorithm, build in or provided by commonly used library (such as boost).

-------------------------- [edit] --------------------------

Some of you wonted working example, here it is.

#include <iostream>
#include <vector>
#include <string>
#include <boost/range/adaptor/sliced.hpp>
#include <boost/algorithm/string/join.hpp>
#include <boost/assign.hpp> //just for fun

using namespace std;
using namespace boost::assign;

int main(int , char **)
{
    const int K = 2;
    vector< string > v;
    v += "a","b","c","d","e";

    for( unsigned int i=0; i<v.size(); i+=K )
        cout << boost::algorithm::join( 
                    v | boost::adaptors::sliced( i, min(i+K, v.size()) ), "" ) 
             << endl;
}

Output:

ab 
cd
e

Upvotes: 17

Views: 11774

Answers (5)

ampawd
ampawd

Reputation: 1010

I'm sorry to be a bit late to answer, but it looks that no one proposed this solution:

template <typename Cont, typename Func, typename Sep>
void do_chunks(const Cont& cont, size_t K, Func f, Sep sep, char c='\n') {
    size_t size = cont.size();
    for (int i = 0; i < K; ++i) {
        for (int j = i*size / K, n = (i + 1)*size / K; j < n; ++j) {
            f(cont[j]);
        }
        sep(c);
    }
}

std::vector<int> m = {
    1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
};
do_chunks(
    m, 
    3, 
    [](const auto& x) { std::cout << x << " "; }, 
    [](char c)        { std::cout << c; }
);

output:

1 2 3
4 5 6 7
8 9 10 11

So when i == K - 1 (i + 1)*size / K == size exactly, thus we are properly iterating over all container elements without any out of bounds accesses.

Upvotes: 1

Marcin
Marcin

Reputation: 238847

I little modified anwser by @BenjaminB and added an example of using this function:

#include <iostream>
#include <vector>

using namespace std;

template<typename Iterator, typename Func>
void chunks(Iterator begin,
            Iterator end,
            iterator_traits<string::iterator>::difference_type k,
            Func f)
{
    Iterator chunk_begin;
    Iterator chunk_end;
    chunk_end = chunk_begin = begin;

    do
    {
        if(std::distance(chunk_end, end) < k)
            chunk_end = end;
        else
            std::advance(chunk_end, k);
        f(chunk_begin,chunk_end);
        chunk_begin = chunk_end;
    }
    while(std::distance(chunk_begin,end) > 0);
}

int main() {
    string in_str{"123123123"};

    vector<string> output_chunks;

    auto f = [&](string::iterator &b, string::iterator &e)
    {
        output_chunks.emplace_back(b, e);
    };

    chunks(in_str.begin(), in_str.end(), 3, f);

    for (string a_chunk: output_chunks)
    {
        cout << a_chunk << endl;
    }

    return 0;
}

The result is:

123
123
123

Hope someone will find it useful.

Upvotes: 1

BenjaminB
BenjaminB

Reputation: 1869

I don't know if it's very elegant, but you can use iterators with standard functions advance and distance :

template<typename Iterator, typename Func, typename Distance>
void chunks(Iterator begin, Iterator end, Distance k ,Func f){
    Iterator chunk_begin;
    Iterator chunk_end;
    chunk_end = chunk_begin = begin;

    do{
        if(std::distance(chunk_end, end) < k)
            chunk_end = end;
        else
            std::advance(chunk_end, k);
        f(chunk_begin,chunk_end);
        chunk_begin = chunk_end;
    }while(std::distance(chunk_begin,end) > 0);
}

Upvotes: 11

Paul Preney
Paul Preney

Reputation: 1293

WRT "For loop is it needed?"

A loop construct is needed if one wants to avoid using std::distance() as one needs to track how much is left. (For random access containers, std::distance() is cheap --but for all others it is too costly for this algorithm.)

WRT i+K / min() issue

Don't write i+K or anything that could cause wrapping/over/underflow issues. Instead track how much is left and subtract. This will require using min().

WRT elegant solution

This algorithm is more "elegant" in that:

  1. The first two "WRT" items above are not issues.
  2. It does not use any external libraries; --it only makes use of std::copy_n() and std::advance().
  3. It exploits argument-dependent lookup if that is needed/desired.
  4. It uses the container's size_type.
  5. It will work efficiently with any container.
  6. If K <= 0 then std::domain_error is thrown.

The solution is C++11 although it can be easily converted to C++98 if one also writes copy_n().

#include <vector>
#include <string>
#include <sstream>
#include <iterator>
#include <iostream>
#include <stdexcept>
#include <algorithm>

template <
  typename Container,
  typename OutIter,
  typename ChunkSepFunctor
>
OutIter chunker(
  Container const& c, 
  typename Container::size_type const& k,
  OutIter o,
  ChunkSepFunctor sep
)
{
  using namespace std;

  if (k <= 0)
    throw domain_error("chunker() requires k > 0");

  auto chunkBeg = begin(c);
  for (auto left=c.size(); left != 0; )
  {
    auto const skip = min(left,k);

    o = copy_n(chunkBeg, skip, o);

    left -= skip;
    advance(chunkBeg, skip);

    if (left != 0)
      sep();
  }
  return o; 
}

int main()
{
  using namespace std;

  using VECTOR = vector<string>;
  VECTOR v{"a","b","c","d","e"};

  for (VECTOR::size_type k = 1; k < 7; ++k)
  {
    cout << "k = " << k << "..." << endl;
    chunker(
      v, k, 
      ostream_iterator<VECTOR::value_type>(cout),
      []() { cout << endl; }
    );
  }
  cout << endl;
}

EDIT: It would be better to write chunker() so that the sep functor received the output iterator and returned an output iterator. This way any updates between outputting chunks concerning the output iterator could be correctly handled and the generic routine far more flexible. (For example, this would allow the functor to remember the end position of each chunk; the functor to copy out chunks, empty the container, and reset the output iterator; etc.) If this is undesired, then like the Standard Library one could have more than one overload with different sep requirements, or, eliminating the argument altogether. This updated chunker() looks like this:

template <
  typename Container,
  typename OutIter,
  typename ChunkSepFunctor
>
OutIter chunker(
  Container const& c, 
  typename Container::size_type const& k,
  OutIter o,
  ChunkSepFunctor sep
)
{
  using namespace std;

  if (k <= 0)
    throw domain_error("chunker() requires k > 0");

  auto chunkBeg = begin(c);
  for (auto left=c.size(); left != 0; )
  {
    auto const skip = min(left,k);

    o = copy_n(chunkBeg, skip, o);

    advance(chunkBeg, skip);
    left -= skip;

    if (left != 0)
      o = sep(o);
  }
  return o; 
}

and the call to chunk would be the less pretty but generally more useful (although not in this case):

chunker(
  v, k, 
  ostream_iterator<VECTOR::value_type>(cout),
  [](ostream_iterator<VECTOR::value_type> o) { cout << endl; return o; }
);

This has turned out to be a surprisingly elegant little routine.

Upvotes: 8

orlp
orlp

Reputation: 117791

This is a sort-of-generic solution with good performance:

template <class T, class Func>
void do_chunks(T container, size_t K, Func func) {
    size_t size = container.size();
    size_t i = 0;

    // do we have more than one chunk?
    if (size > K) {
        // handle all but the last chunk
        for (; i < size - K; i += K) {
            func(container, i, i + K);
        }
    }

    // if we still have a part of a chunk left, handle it
    if (i % K) {
        func(container, i, i + i % K);
    }
}

Upvotes: 9

Related Questions