Vladimir Talybin
Vladimir Talybin

Reputation: 614

Extract input iterator from std::copy and std::copy_n

I trying to implement an unserialization method that take an input iterator and perform a series of chunk reads (using std::copy and std::copy_n). Something like this (just an example):

template <class InputIt>
InputIt unserialize(InputIt it)
{
  std::copy_n(it, sizeof(header_type), reinterpret_cast<char*>(&header));
  std::copy_n(it, header.payload_size, std::back_inserter(payload));
  it = optional.unserialize(it);
  return it;
}

How do I advance input iterator in this case so each following call to std::copy_n continue read from it and I can finally return it?

I want to be generic to iterator categories (especially RandomAccessIterator and InputIterator) for performance reasons and hope it is possible to use std::copy methods without need to rewrite these. Stuff like bound checking and such will be done by iterator adapter or checked before unserialization call if size is known.

What do not work nor acceptable:

  1. Using std::copy_n<InputIt&>(it, ...) may work for some categories but not for all and it's too unreliable.
  2. Using std::advance after each call leads to rereading same block again for some iterators. Not preferable and may be impossible for some sources.

UPDATE Making iterator reference adapter doesn't help as random access iterator version of copy_n return pointer to past the last element copied while input iterator version return pointer to the last element copied. So I guess own version of copy_n works best with additional iterator adapter for bound checking.

Upvotes: 9

Views: 867

Answers (3)

LanYi
LanYi

Reputation: 169

I know this is an old question, but... Recently I've run into similar issue, and I feel pretty sad when I see there isn't any good solution for this problem... Nevertheless, I've tried to write an iterator wrapper like what Toby Speight said, and I've extended it a bit. This wrapper is only used for InputIterator, because other kind of iterators supported by std::copy and std::copy_n are multipass iterators, and it's possible to directly use std::next and std::advance on them instead of using this wrapper.

template<typename Iterator>
struct Wrapper {
    using traits = std::iterator_traits<Iterator>;
    using difference_type = typename Traits::difference_type;
    using reference = typename Traits::reference;
    //...

    reference operator*() const { return *(*(this->current)); }

    //need implement operator++(int) too, in the similar way
    Wrapper& operator++() { 
        if(this->indicator != 0) {
            ++(*(this->current));
            --this->indicator;
        }

        return *this;
    }

    //need to implement operator!= too, in the similar way
    bool operator==(const Wrapper& other) const { 
        return *(this->current) == *(other.current) or this->indicator == other.indicator;
    }

    Iterator* current;
    difference_type indicator;
};

template<typename Iterator>
struct Range {
    using category = typename std::iterator_traits<Iterator>::iterator_category;
    //...
    using difference_type = typename std::iterator_traits<Iterator>::difference_type;

    constexpr bool isForwardOrAbove() { 
        return std::is_base_of_v<std::forward_iterator_tag, category>;
    }

    using iterator = std::conditional_t<isForwardOrAbove(), Iterator, Wrapper<Iterator>>;

    std::pair<iterator, iterator> splitSubRange(difference_type n) {
        if constexpr (isForwardOrAbove()) {
            //forward iterators are multi-pass iterators, so using std::advance on one iterator will not invalidate its copies
            auto oldBegin = this->begin;
            std::advance(this->begin, n);
            return {oldBegin, std::next(oldBegin, n)};
        }
        else {
            //non-forward-iterator
            return {{&(this->begin), n}, {&(this->end), 0}};
        }
    }

    Iterator begin;
    Iterator end;
}

To use them, firstly you need a Range object which holds iterators, and use the splitSubRange method to obtain plain iterators, or, if they are only InputIterators, iterator wrappers.

template<typename InputIterator>
InputIterator unserialize(InputIterator begin, InputIterator end) {
    auto range = Range<InputIterator>{begin, end};
    auto [begin, end] = range.splitSubRange(sizeof(header_type));
    //before having "used" the iterator obtained from range, you shouldn't use range again.
    std::copy(begin, end, header.begin());
    //after having "used" the iterator, you can use the range object to obtain the next sub-range
    auto [begin2, dummy] = range.splitSubRange(sizeof(header_type_2));
    std::copy_n(begin2, sizeof(header_type_2), header2.begin());
    ...
    return range.begin;
}

Upvotes: 0

Toby Speight
Toby Speight

Reputation: 30718

You could create an iterator adapter that always works through a reference to the supplied iterator. Here's the basic skeleton of such an adapter:

#include <iterator>

template<typename InputIterator>
struct IteratorReference
{
    InputIterator& it;

    IteratorReference(InputIterator& it)
        : it(it)
    {}
    // {copy, move, destructor} == default

    // iterator methods (TODO: add the rest)
    IteratorReference& operator++()
    { ++it; return this; }

    typename InputIterator::reference operator*()
    { return *it; }

    // Convert back to original iterator
    operator InputIterator() const
    { return it; }
};


template<typename InputIterator>
IteratorReference<InputIterator> make_iterator_reference(InputIterator it)
{
    return it;
}

To use it, simply create and use a wrapper in place of the original iterator:

#include <algorithm>
#include <vector>

struct Header
{
    std::size_t payload_size;
};

template <class InputIt>
InputIt unserialize(InputIt it, Header& header, std::vector<char>& payload)
{
    auto i_ref = make_iterator_reference(it);
    std::copy_n(i_ref, sizeof header, reinterpret_cast<char*>(&header));
    std::copy_n(i_ref, header.payload_size, std::back_inserter(payload));
    i_ref = optional.unserialize(i_ref);
    return i_ref;
}

Upvotes: 2

PiotrNycz
PiotrNycz

Reputation: 24347

For random access iterator this form can be used - and it is fine:

template <class InputIt, class N, class OutputIt>
InputIt copy_n_advance_input(InputIt it, N dist, OutputIt outIt)
{
    std::copy_n(it, dist, outIt);
    return std::next(it, dist);
}

Unfortunately - problems are when we want to deal with one pass input iterator - like here (got 'd' - not 'c'):

std::string s = "abcd";
std::istringstream ss{s};
auto e = copy_n_advance_input(std::istream_iterator<char>(ss), 
                             2, 
                             std::ostream_iterator<char>(std::cout, ","));
std::cout << "\n" << *e << "\n";

So it seems it is needed, as usual in STL, two forms:

template <class InputIt, class N, class OutputIt>
InputIt copy_n_advance_input_impl(InputIt it, N dist, OutputIt outIt,
                                  std::input_iterator_tag)
{
    while (dist-- > 0)
    {
        *outIt = *it;
        ++outIt;
        ++it;
    }
    return it;
}

template <class InputIt, class N, class OutputIt>
InputIt copy_n_advance_input_impl(InputIt it, N dist, OutputIt outIt, 
                                  std::random_access_iterator_tag)
{
    std::copy_n(it, dist, outIt);
    return std::next(it, dist);
}

template <class InputIt, class N, class OutputIt>
InputIt copy_n_advance_input(InputIt it, N dist, OutputIt outIt)
{
    return copy_n_advance_input_impl(it, dist, outIt, typename std::iterator_traits<InputIt>::iterator_category {});
}

Note, the proposed version for std::input_iterator_tag is not as efficient as in STL (at least for gcc) - it does extra iteration for input - this iteration is not necessary to perform copy - but it is needed to return beginning of "after copy" range (stl_algo.h):

754   template<typename _InputIterator, typename _Size, typename _OutputIterator>
755     _OutputIterator
756     __copy_n(_InputIterator __first, _Size __n,
757          _OutputIterator __result, input_iterator_tag)
758     {
759       if (__n > 0)
760     {
761       while (true)
762         {
763           *__result = *__first;
764           ++__result;
765           if (--__n > 0)
766         ++__first;
767           else
768         break;
769         }
770     }
771       return __result;
772     }

Last note - it is wiser to use, for random-access-iterators (like std::vector::iterator) the version that just calls std algorithms - because they can be even more optimized - e.g. for contiguous memory iterator over POD types - that can be just memcpy'ied. Or some specialization for std::vector<bool> or std::deque<T> exist, that uses their internal structure to perform copy in most efficient way.

Upvotes: 2

Related Questions