Reputation: 58883
I have the following code (to split a string into a vector) that gives me segfault on the second iteration of the for_each loop:
std::string command = "Something something something";
std::sregex_token_iterator splitter {command.begin(), command.end(), std::regex{"\\s+"}, -1};
std::sregex_token_iterator splitter_end;
std::for_each(splitter, splitter_end, [&](std::ssub_match sm) {
cmd.push_back(sm.str());
});
Trying to understand what was going on, I detached the declaration of the regular expression as a named variable and it started working:
std::string command = "Something something something";
std::regex rx {"\\s+"};
std::sregex_token_iterator splitter {command.begin(), command.end(), rx, -1};
std::sregex_token_iterator splitter_end;
std::for_each(splitter, splitter_end, [&](std::ssub_match sm) {
cmd.push_back(sm.str());
});
Can anyone explain this to me?
Upvotes: 2
Views: 154
Reputation: 42588
I know the answer, but I don't like it. I think this may be a defect in clang.
std::sregex_token_iterator
is saving pointer to regular expression.
In the first version, the anonymous std::regex
object is destructed after splitter
is constructed. This leaves splitter
pointing at a deallocated space in memory.
In the second version, rx
will live until the end of the block. This leaves splitter
pointing at a proper object.
std::regex_token_iterator
constructor
template <class _BidirectionalIterator, class _CharT, class _Traits>
regex_token_iterator<_BidirectionalIterator, _CharT, _Traits>::
regex_token_iterator(_BidirectionalIterator __a, _BidirectionalIterator __b,
const regex_type& __re, int __submatch,
regex_constants::match_flag_type __m)
: __position_(__a, __b, __re, __m),
_N_(0),
__subs_(1, __submatch)
{
__init(__a, __b);
}
Constructs __position_
of type std::regex_iterator
:
template <class _BidirectionalIterator, class _CharT, class _Traits>
regex_iterator<_BidirectionalIterator, _CharT, _Traits>::
regex_iterator(_BidirectionalIterator __a, _BidirectionalIterator __b,
const regex_type& __re, regex_constants::match_flag_type __m)
: __begin_(__a),
__end_(__b),
__pregex_(&__re),
__flags_(__m)
{
_VSTD::regex_search(__begin_, __end_, __match_, *__pregex_, __flags_);
}
This is storing the address of __re
as in a pointer. Once __re
goes out of scope, __re
is destructed __position_
is left with a dangling pointer.
Final Note
The following works:
std::string command = "Something something something";
std::for_each(std::sregex_token_iterator{command.begin(), command.end(), std::regex{"\\s+"}, -1},
std::sregex_token_iterator{},
[&](std::ssub_match sm) {
cmd.push_back(sm.str());
});
This is because the anonymous std::regex
has a lifetime the same as the anonymous std::sregex_token_iterator
object.
Upvotes: 4