Lacobus
Lacobus

Reputation: 1658

awk: function to escape regex operators from a string

Need a function to escape a string containing regex expression operators in an awk script.

I came across this 'ugly' solution:

function escape_string( str )
{
    gsub( /\\/, "\\\\",  str );
    gsub( /\./, "\\.", str );
    gsub( /\^/, "\\^", str );
    gsub( /\$/, "\\$", str );
    gsub( /\*/, "\\*", str );
    gsub( /\+/, "\\+", str );
    gsub( /\?/, "\\?", str );
    gsub( /\(/, "\\(", str );
    gsub( /\)/, "\\)", str );
    gsub( /\[/, "\\[", str );
    gsub( /\]/, "\\]", str );
    gsub( /\{/, "\\{", str );
    gsub( /\}/, "\\}", str );
    gsub( /\|/, "\\|", str );

    return str;
}

Any better ideas?

Upvotes: 7

Views: 2966

Answers (2)

RARE Kpop Manifesto
RARE Kpop Manifesto

Reputation: 2855

I use this small util function that escapes far more than needed, but makes life a lot easier by using character ranges :


 # Functions, listed alphabetically

 1 function __(_) {

       gsub("[!-/:-@[-\140{-~]", "[&]", _)  # I use \140 cuz I dont like random  
       gsub(/\^|\\/, "\\\\&", _)            # unpaired backticks dangling in my code

       return _
   }

   [!]["][#][$][%][&]['][(][)][*][+][,][-][.][/]
  0123456789                  [:][;][<][=][>][?]

 [@]ABCDEFGHIJKLMNOPQRSTUVWXYZ [[][\\][]][\^][_]
 [`]abcdefghijklmnopqrstuvwxyz [{][|] [}] [~]

Placing them all inside individual bracket expressions prevents any accidental interpretation of adjoining chars.

I have a more complex version that also escape the recognized sequences :

 [\a][\b][\t][\n][\v][\14][\r]

I use \14 in lieu of \f so the gawk linter wouldn't complain all the time.

Upvotes: 1

anubhava
anubhava

Reputation: 785481

You can just use single gsub using a character class like this:

function escape_string( str ) {
   gsub(/[\\.^$(){}\[\]|*+?]/, "\\\\&", str)
   return str
}

& is back-reference to the matched string and \\\\ is for escaping the match.

Upvotes: 8

Related Questions