pistacchio
pistacchio

Reputation: 58883

Object creation on the spot vs variable declaration

I have the following code (to split a string into a vector) that gives me segfault on the second iteration of the for_each loop:

std::string command = "Something something something";
std::sregex_token_iterator splitter {command.begin(), command.end(), std::regex{"\\s+"}, -1};
std::sregex_token_iterator splitter_end;
std::for_each(splitter, splitter_end, [&](std::ssub_match sm) {
    cmd.push_back(sm.str());
});

Trying to understand what was going on, I detached the declaration of the regular expression as a named variable and it started working:

std::string command = "Something something something";
std::regex rx {"\\s+"};
std::sregex_token_iterator splitter {command.begin(), command.end(), rx, -1};
std::sregex_token_iterator splitter_end;
std::for_each(splitter, splitter_end, [&](std::ssub_match sm) {
    cmd.push_back(sm.str());
});

Can anyone explain this to me?

Upvotes: 2

Views: 154

Answers (1)

Jeffery Thomas
Jeffery Thomas

Reputation: 42588

I know the answer, but I don't like it. I think this may be a defect in clang.

std::sregex_token_iterator is saving pointer to regular expression.

In the first version, the anonymous std::regex object is destructed after splitter is constructed. This leaves splitter pointing at a deallocated space in memory.

In the second version, rx will live until the end of the block. This leaves splitter pointing at a proper object.


std::regex_token_iterator constructor

template <class _BidirectionalIterator, class _CharT, class _Traits>
regex_token_iterator<_BidirectionalIterator, _CharT, _Traits>::
regex_token_iterator(_BidirectionalIterator __a, _BidirectionalIterator __b,
                     const regex_type& __re, int __submatch,
                     regex_constants::match_flag_type __m)
: __position_(__a, __b, __re, __m),
_N_(0),
__subs_(1, __submatch)
{
    __init(__a, __b);
}

Constructs __position_ of type std::regex_iterator:

template <class _BidirectionalIterator, class _CharT, class _Traits>
regex_iterator<_BidirectionalIterator, _CharT, _Traits>::
regex_iterator(_BidirectionalIterator __a, _BidirectionalIterator __b,
               const regex_type& __re, regex_constants::match_flag_type __m)
: __begin_(__a),
__end_(__b),
__pregex_(&__re),
__flags_(__m)
{
    _VSTD::regex_search(__begin_, __end_, __match_, *__pregex_, __flags_);
}

This is storing the address of __re as in a pointer. Once __re goes out of scope, __re is destructed __position_ is left with a dangling pointer.


Final Note

The following works:

std::string command = "Something something something";
std::for_each(std::sregex_token_iterator{command.begin(), command.end(), std::regex{"\\s+"}, -1},
              std::sregex_token_iterator{},
              [&](std::ssub_match sm) {
    cmd.push_back(sm.str());
});

This is because the anonymous std::regex has a lifetime the same as the anonymous std::sregex_token_iterator object.

Upvotes: 4

Related Questions