Reputation: 275730
C++11 has 6 different regular expression grammars you can use. In my case, I am interacting with a component that is using modified ECMAScript regular expressions.
I need to create a regular expression "match a string starting with X", where X is a string literal I have.
So the regular expression I want is roughly ^X.*
. Except the string X could contain more regular expression special characters, and I want them to occur.
Which means I really want ^ escaped(X) .*
.
Now, I can read over the ECMAScript documentation, find all of the characters which have a special meaning, write a function that escapes them, and be done. But this seems inelegant, inefficient, and error prone -- especially if I want to support all 6 kinds of regular expressions that C++ supports currently, let alone in the future.
Is there a simple way in the standard to escape a literal string to embed in a C++ regular expression, possibly as a function of the regular expression grammar, or do I have to roll my own?
Here is a similar question using the boost library, where the list of escapes is hard-coded, and then a regular expression is generated that backslashes them. Am I reduced to adapting that answer for use in std
?
Upvotes: 9
Views: 1228
Reputation: 131970
(answering quite a while later, so probably OP has worked something out, but still).
A preliminary comment: The regular expression you'll want, in ECMAScript (and may other) syntaxes, is ^X
, and you don't need the extra .*
afterwards.
As for the approach to this task: You're asking for a general solution for all regex syntax options. Well, YAGNI - You ain't gonna need it. Unless you're writing a general-purpose library supposed to support all C++ regexp syntaxes, don't try to solve the whole world's problems yourself and right away. This is further emphasized by the fact that, since you wrote your question, additional regexp syntax options have been added to C++... so by C++17 it's, um, 10 I think. See here.
So I suggest you write something that is potentially extensible to other syntax options, but only actually works - for now - with the syntax option(s) you need. e.g.:
template <std::regex::syntax_option_type SyntaxOption>
std::string escape_for_regex(const std::string_view sv);
or perhaps
template <std::regex::syntax_option_type SyntaxOption>
std::string_view
escape_for_regex(
const std::string_view source,
std::string_view destination
);
in which the returned string_view
indicates how much of the destination you're actually using. One can bike-shed about the signature some more (e.g. perhaps use iterators? ranges?)
and you'll specialize this for std::regex::ECMAScript
. The implementation is provided in this SO question:
Is there a RegExp.escape function in JavaScript?
with the answer being that there isn't, but you could add it like so (in Javascript mind you):
RegExp.escape = function(s) {
return s.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');
};
moving that to C++, and with our first option for the function signature, this becomes:
template <>
std::string escape_for_regex<std::regex::ECMAScript>(const std::string_view sv)
{
const std::regex to_escape("[-/\\\\^$*+?.()|[\\]{}]");
const std::string escaped("\\$1");
const std::string s{sv};
return std::regex_replace(s, to_escape, escaped);
}
Caveat: Haven't properly tested this. I also don't like the extra string construction, so probably another one of the regex_replace
variants might be usable.
Upvotes: 1
Reputation:
If you have to write your own, there is only two kinds you should need to know.
BRE and the rest.
These should work below. Use the ECMAScript type regex's to operate on the input string.
The below regexs' are formulated using the special characters from here:
What special characters must be escaped in regular expressions?
Under answer Legacy RegEx Flavors (BRE/ERE)
Both use the same replacement: "\\\\$1"
For BRE input:
# "(\\\\[+?(){}|]|[.^$*\\[\\]\\\\-])"
( # (1 start)
\\ [+?(){}|] # not sure this is needed (its not needed)
|
[.^$*\[\]\\-]
) # (1 end)
For ERE or ECMAScript input:
# "([.^$*+?()\\[\\]{}\\\\|-])"
( [.^$*+?()\[\]{}\\|-] ) # (1)
BRE input example:
Before -
+_)(*&^%$#@!asdfasfd hello
+ ? ( ) { } |
\+ \? \( \) \{ \} \|
\\+ \\? \\( \\) \\{ \\} \\|
}{":][';/.,<>?
here is
After -
+_)(\*&\^%\$#@!asdfasfd hello
+ ? ( ) { } |
\\+ \\? \\( \\) \\{ \\} \\|
\\\\+ \\\\? \\\\( \\\\) \\\\{ \\\\} \\\\|
}{":\]\[';/\.,<>?
here is
Upvotes: 1