Allan Green
Allan Green

Reputation: 21

Unexpected behaviour of regsub with escaped characters

The TCL regsub command seems to behave strangely when I give it strings which include escaped characters.

I have used autoexpect to capture a series of screen displays from an app for which I want to automate testing. Rather than attempt to use its output as a single block, I am attempting to turn the script generated into series of character strings to improve maintainability. I've used vi to create a series of fragments, which I then read in one at a time and use as matches with expect. I do have to do some substitution (for example "^[" becomes "ESC") but I've got to fragment 5, so the idea is generally working. Unfortunately I'm beaten by the substitution of "\[" with "[" in the pattern "xxxx\[[xxxx" (x's are other characters).

I've written a Tcl ascii string dump procedure, and I'm using that here.

% ascii_string_dump "\\\[" 0 8 pattern

*** ASCII dump of: pattern (  2 characters) ***
---------------------------------------------------------------------
| 0000 |    \   [ ... ... ... ... ... ... | 5c 5b .. .. .. .. .. .. |
| 0008 |  ... ... ... ... ... ... ... ... | .. .. .. .. .. .. .. .. |
---------------------------------------------------------------------
% ascii_string_dump "a\\\[\[z" 0 8 test

*** ASCII dump of: test (  5 characters) ***
---------------------------------------------------------------------
| 0000 |    a   \   [   [   z ... ... ... | 61 5c 5b 5b 7a .. .. .. |
| 0008 |  ... ... ... ... ... ... ... ... | .. .. .. .. .. .. .. .. |
---------------------------------------------------------------------
% 
% regsub -all "\\\[" "a\\\[\[z" "Z" newstring
2
% ascii_string_dump $newstring 0 8 newstring

*** ASCII dump of: newstring (  5 characters) ***
---------------------------------------------------------------------
| 0000 |    a   \   Z   Z   z ... ... ... | 61 5c 5a 5a 7a .. .. .. |
| 0008 |  ... ... ... ... ... ... ... ... | .. .. .. .. .. .. .. .. |
---------------------------------------------------------------------
% 

In the above series, I first check that I can create the 2-character pattern "\[". I then create a pattern which is an abbreviated version of my real problem string, "a\[[z". Then I submit the regexp and test string to regsub, hoping to replace the "\[" characters with a single "Z". As you can see, two substitutions have occurred (rather than one) and there is an unexpected "\" at character 2!

Any enlightenment very welcome. (I've spent a lot of time on this (including writing the ascii dump proc!) but I'm getting nowhere...

Best wishes Allan

Upvotes: 0

Views: 81

Answers (1)

Jerry
Jerry

Reputation: 71538

This is how regular expressions generally work in most languages.

If you use raw strings, your regsub command would look like this:

regsub -all {\[} {a\[[z} "Z" newstring

And in regular expressions, \[ represents the literal character [ (the \ is escaping the meta character [ which otherwise indicate the beginning of a character class).

If you want to replace the string \[, then you need to replace a backslash and an opening square parenthesis, represented in regular expressions as: \\ and \[, so your raw regsub becomes:

regsub -all {\\\[} {a\[[z} "Z" newstring
puts $newstring
# aZ[z

If you want to use quotes, you will need a lot more escaping to do. Each character in \\\[ will need to be escaped, basically, you need to add a backslash for each one of them:

regsub -all "\\\\\\\[" "a\\\[\[z" "Z" newstring
puts $newstring
# aZ[z

Or if you can use string map:

string map {{\[} {Z}} {a\[[z}

or

string map {"\\\[" {Z}} "a\\\[\[z"

should do

Upvotes: 1

Related Questions