Sparsh
Sparsh

Reputation: 11

backreferencing not working in tcl regexp

I am new to regular expressions and tcl and am facing very basic issue from a long time now.

I am given with the task to find all the characters in given word, whose immediate next character is not identical to this character. I have written following tcl snippet to achieve this:

set str "goooo";
set lst [regexp -all -inline {(\w)[^\1]} $str];
puts $lst

I am getting following error:

couldn't compile regular expression pattern: invalid escape \ sequence
    while executing
"regexp -all -inline {(\w)[^ \1]} $str"

Is there any other way to use backreferencing in tcl?

Upvotes: 1

Views: 124

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626870

Backreferences cannot be used inside bracket expressions in any regex flavor. [^\1] matches any char but a \x01 char. This happens so because bracket expressions are meant to use exact literal characters or ranges of them.

In your case, you can remove all chunks of repeated chars with (\w)\1+ (while replacing with the same single char using the \1 backreference in the replacement pattern) and then extract the word chars:

set lst [regexp -all -inline {\w} [regsub -all {(\w)\1+} $str {\1}]];

See the online demo:

set str "sddgoooo";
set lst [regexp -all -inline {\w} [regsub -all {(\w)\1+} $str {\1}]];
puts $lst

Output:

s d g o

Note that in other regex flavors, you could use a regex with a negative lookahead: (\w)(?!\1) (see this regex demo). The (?!\1) negative lookahead matches a location that is not immediately followed with Group 1 value. Unfortunately, Tcl regex flavor - although Tcl AREs generally support lookaheads - does not support lookaheads with backreference inside them.

Upvotes: 1

Related Questions