user707549
user707549

Reputation:

Backslash in a Tcl regexp

Regarding the regexp in TCL, if I use the following regexp:

regexp "helloworld\[\\s]+.name."

to match the following output:

helloworld  (name)

it works. but I wonder if I need to add "\" in front of the "]", I saw some code made by others, they did not close the "]" with "\", I want to know why.

Upvotes: 0

Views: 1305

Answers (3)

Hai Vu
Hai Vu

Reputation: 40723

I believe this expression is what you want:

regexp {helloworld\s+.name.} $the_string

You don't need any square bracket at all.

Upvotes: 0

kostix
kostix

Reputation: 55473

The one reason might be what Utkanos explained, the other one might be due to a Tcl-specific behavior: the [ character has special meaning in places where command substitution is allowed. Observe:

% proc foo {} { return y }
% puts x[foo]z
xyz

Consequently, when you're working with a regex in Tcl (either by trying to specify it literally, or construct in at runtime etc), you have to think how the string which forms this regex will be treated by Tcl.

That's why most of the time you see the characters of a regex passed directly to the regexp command grouped using the curly braces, { and }: it inhibits (most of) Tcl's substitutions and hence allows to write the regex specification "as is", almost in its plain syntax, without any escaping.

But this obviously does not play well for cases when you want to dynamically construct the specification (say, embed the contents of a variable in there). Usually people resort to group the regex characters using double quotes consequently needing to do special escaping to prevent certain Tcl's substitutions. More clean approach might be to construct the pattern using the append command.

More info on grouping is here, here and here.

As to finding the book on the Internet, "Mastering Regular Expressions" is what usually considered to be the book on the subject.

As a side note, in your particular example the square brackets are not needed at all: in regexes, they are used to create "character ranges" — patterns that match a single characters out of the specified range, — and in your case the range consists of exactly one (meta) character defined to match a single whitespace character in the input. So in this particular case the pattern helloworld\s+.name. would do just fine.

Upvotes: 1

Mitya
Mitya

Reputation: 34556

No, because you are using [ with its special meaning, i.e. to define a range. You would escape it with a backslash only if you wanted to match a literal [. Backslashes are used to escape characters which otherwise invoke special behaviour in REGEXP.

(Javascript)

var str = "[hello]";
str.match(/[a-z]+/); //resultant array: ['hello']
str.match(/\[[a-z]+\]/); //resultant array: ['[hello]']

Upvotes: 0

Related Questions