Reputation: 5779
I'm using std::regex_replace
in a C++ Windows project (Visual Studio 2010). The code looks like this:
std::string str("http://www.wikipedia.org/");
std::regex fromRegex("http://([^@:/]+\\.)?wik(ipedia|imedia)\\.org/", std::regex_constants::icase);
std::string fmt("https://$1wik$2.org/");
std::string result = std::regex_replace(str, fromRegex, fmt);
I would expect result
to be "https://www.wikipedia.org/"
, but I get "https://www.wikipedia.wikipedia.org/"
.
A quick check with sed
gives me the expected result
$ cat > test.txt
http://www.wikipedia.org/
$ sed 's/http:\/\/([^@:\/]+\.)?wik(ipedia|imedia)\.org\//https:\/\/$1wik$2.org\//' test.txt
http://www.wikipedia.org/
I don't get where the difference comes from. I checked the flags that can be used with std::regex_replace
, I didn't see one that would help in this case.
Update
These variants work fine:
std::regex fromRegex("http://([^@:/]+\\.)wik(ipedia|imedia)\\.org/", std::regex_constants::icase);
std::regex fromRegex("http://((?:[^@:/]+\\.)?)wik(ipedia|imedia)\\.org/", std::regex_constants::icase);
std::regex fromRegex("http://([a-z]+\\.)?wik(ipedia|imedia)\\.org/", std::regex_constants::icase);
std::regex fromRegex("http://([^a]+\\.)?wik(ipedia|imedia)\\.org/", std::regex_constants::icase);
bu not these:
std::regex fromRegex("http://([^1-9]+\\.)?wik(ipedia|imedia)\\.org/", std::regex_constants::icase);
std::regex fromRegex("http://([^@]+\\.)?wik(ipedia|imedia)\\.org/", std::regex_constants::icase);
std::regex fromRegex("http://([^:]+\\.)?wik(ipedia|imedia)\\.org/", std::regex_constants::icase);
It makes no sense to me...
Upvotes: 2
Views: 646
Reputation: 76315
There's a subtle error in the regular expression. Don't forget that escape sequences in string literals are expanded by the compiler. So change
"http://([^@:/]+\.)?wik(ipedia|imedia)\.org/"
to
"http://([^@:/]+\\.)?wik(ipedia|imedia)\\.org/"
That is, replace each of the two single backslashes with a pair of backslashes.
EDIT: this doesn't seem to affect the problem, though. On the two implementations I tried (Microsoft and clang), the original problem doesn't occur, with our without the doubled backslashes. (Without, you get compiler warnings about an invalid escape sequence, but the resulting .
wildcard matches the .
character in the target sequence, just as a \.
would)
Upvotes: 3