Reputation: 31
I used regex_token_iterator<> to get all matched substrings in a line, as suggested in this question. But the code sometimes misses 2nd matched substrings in lines, and the lines where this miss happens changes at different runs. Is this a bug of regex_token_iterator<>, or is there something wrong in my code? The compiler I used is Apple clang version 14.0.0 (clang-1400.0.29.202), and I used -std=c++14 to compile the following code.
I also tried another suggestion in the question above, which is to use while-loop to repeatedly apply regex_search(), and that version of code worked properly. I just want to know why the version with regex_token_iterator<> is not working, whether my usage is wrong or not.
using namespace std;
struct bad_from_string : bad_cast{
const char* what() const noexcept override{
return "bad cast from string";
template<typename T>
T from_string(const string& s){
istringstream is{s};
T t;
throw bad_from_string{};
return t;
int main(){
regex pat{R"((\d{1,2})/(\d{1,2})/(\d{4}))"}; // e.g. 7/21/2022
ifstream ifs{"test_regex_token_iterator.txt"};
ofstream ofs{"test_out_regex_token_iterator.txt"};
regex_token_iterator<string::iterator> rend; // default constructor is used for indicating the end of the sequence
for(string line; getline(ifs, line);){
smatch matches;
string replace_pattern;
int month{0}, day{0}, year{0};
regex_token_iterator<string::iterator> riter(line.begin(), line.end(), pat);
// for each matched substring, replace it individually
string matched_substring{(*riter).str()};
// *riter returns a reference to the sub_match object riter is pointing to.
// sub_match is not a string. sub_match::str() returns the string of the sub_match.
// put each matched substring into variable "matches"
regex_search(matched_substring, matches, pat);
// get the day, month, and year values in int
day = from_string<int>(matches.str(2));
month = from_string<int>(matches.str(1));
year = from_string<int>(matches.str(3));
// here make replace_pattern yyyy-mm-dd
if(month<10 && day<10)
replace_pattern = to_string(year)+"-0"+to_string(month)+"-0"+to_string(day); // both day and month need the fron '0'
else if(month<10)
replace_pattern = to_string(year)+"-0"+to_string(month)+"-"+to_string(day);
else if(day<10)
replace_pattern = to_string(year)+"-"+to_string(month)+"-0"+to_string(day);
replace_pattern = to_string(year)+"-"+to_string(month)+"-"+to_string(day);
line = regex_replace(line, regex(matched_substring), replace_pattern); // regex_replace() returns a string
// since I want to replace only 1 matched substring *riter, I use the exact substring
// in the place of regex pattern
++riter; // move to the next matched substring
ofs << line << endl;
return 0;
12/01/2022 - 12/31/2022
12/01/2022 - 12/31/2022
12/01/2022 - 12/31/2022
12/01/2022 - 12/31/2022
10/01/2022 - 10/31/2022
10/01/2022 - 10/31/2022
10/01/2022 - 10/31/2022
10/01/2022 - 10/31/2022
10/01/2022 - 10/31/2022
sample test_out_regex_token_iterator.txt (but the result changes in different runs):
2022-12-01 - 12/31/2022
2022-12-01 - 2022-12-31
2022-12-01 - 12/31/2022
2022-12-01 - 12/31/2022
2022-10-01 - 10/31/2022
2022-10-01 - 2022-10-31
2022-10-01 - 10/31/2022
2022-10-01 - 10/31/2022
2022-10-01 - 10/31/2022
I expected all the matched substrings, including the dates in the 2nd column, were replaced, but only part of them were replaced properly. The expected result:
2022-12-01 - 2022-12-31
2022-12-01 - 2022-12-31
2022-12-01 - 2022-12-31
2022-12-01 - 2022-12-31
2022-10-01 - 2022-10-31
2022-10-01 - 2022-10-31
2022-10-01 - 2022-10-31
2022-10-01 - 2022-10-31
2022-10-01 - 2022-10-31
Upvotes: 0
Views: 62
Reputation: 36488
enabling address sanitiser shows that your code is causing undefined behaviour:
contains iterators from line
but at the end of your while loop you reassign line
, invalidating line
's iterators and therefore invalidating riter
, when you then try to increment riter
you enter the realms of undefined behaviour.
Adding a separate string for your output fixes the problem:
for(string line; getline(ifs, line);){
smatch matches;
string outputLine = line;
string replace_pattern;
int month{0}, day{0}, year{0};
regex_token_iterator<string::iterator> riter(line.begin(), line.end(), pat);
// for each matched substring, replace it individually
string matched_substring{(*riter).str()};
// *riter returns a reference to the sub_match object riter is pointing to.
// sub_match is not a string. sub_match::str() returns the string of the sub_match.
// put each matched substring into variable "matches"
regex_search(matched_substring, matches, pat);
// get the day, month, and year values in int
day = from_string<int>(matches.str(2));
month = from_string<int>(matches.str(1));
year = from_string<int>(matches.str(3));
// here make replace_pattern yyyy-mm-dd
if(month<10 && day<10)
replace_pattern = to_string(year)+"-0"+to_string(month)+"-0"+to_string(day); // both day and month need the fron '0'
else if(month<10)
replace_pattern = to_string(year)+"-0"+to_string(month)+"-"+to_string(day);
else if(day<10)
replace_pattern = to_string(year)+"-"+to_string(month)+"-0"+to_string(day);
replace_pattern = to_string(year)+"-"+to_string(month)+"-"+to_string(day);
outputLine = regex_replace(outputLine, regex(matched_substring), replace_pattern); // regex_replace() returns a string
// since I want to replace only 1 matched substring *riter, I use the exact substring
// in the place of regex pattern
++riter; // move to the next matched substring
ofs << outputLine << endl;
Upvotes: 1