Reputation: 1658
Need a function to escape a string containing regex expression operators in an awk script.
I came across this 'ugly' solution:
function escape_string( str )
{
gsub( /\\/, "\\\\", str );
gsub( /\./, "\\.", str );
gsub( /\^/, "\\^", str );
gsub( /\$/, "\\$", str );
gsub( /\*/, "\\*", str );
gsub( /\+/, "\\+", str );
gsub( /\?/, "\\?", str );
gsub( /\(/, "\\(", str );
gsub( /\)/, "\\)", str );
gsub( /\[/, "\\[", str );
gsub( /\]/, "\\]", str );
gsub( /\{/, "\\{", str );
gsub( /\}/, "\\}", str );
gsub( /\|/, "\\|", str );
return str;
}
Any better ideas?
Upvotes: 7
Views: 2966
Reputation: 2855
I use this small util function that escapes far more than needed, but makes life a lot easier by using character ranges :
# Functions, listed alphabetically
1 function __(_) {
gsub("[!-/:-@[-\140{-~]", "[&]", _) # I use \140 cuz I dont like random
gsub(/\^|\\/, "\\\\&", _) # unpaired backticks dangling in my code
return _
}
[!]["][#][$][%][&]['][(][)][*][+][,][-][.][/]
0123456789 [:][;][<][=][>][?]
[@]ABCDEFGHIJKLMNOPQRSTUVWXYZ [[][\\][]][\^][_]
[`]abcdefghijklmnopqrstuvwxyz [{][|] [}] [~]
Placing them all inside individual bracket expressions prevents any accidental interpretation of adjoining chars.
I have a more complex version that also escape the recognized sequences :
[\a][\b][\t][\n][\v][\14][\r]
I use
\14
in lieu of\f
so thegawk
linter wouldn't complain all the time.
Upvotes: 1
Reputation: 785481
You can just use single gsub
using a character class like this:
function escape_string( str ) {
gsub(/[\\.^$(){}\[\]|*+?]/, "\\\\&", str)
return str
}
&
is back-reference to the matched string and \\\\
is for escaping the match.
Upvotes: 8