Reputation:
I read the example code from golang.org website. Essentially the code looks like this:
re := regexp.MustCompile("a(x*)b")
fmt.Println(re.ReplaceAllString("-ab-axxb-", "T"))
fmt.Println(re.ReplaceAllString("-ab-axxb-", "$1"))
fmt.Println(re.ReplaceAllString("-ab-axxb-", "$1W"))
fmt.Println(re.ReplaceAllString("-ab-axxb-", "${1}W"))
The output is like this:
-T-T-
--xx-
---
-W-xxW-
I understand the first output, but I don't understand the the rest three. Can someone explain to me the results 2,3 and 4. Thanks.
Upvotes: 8
Views: 12353
Reputation: 626870
The most intriguing is the fmt.Println(re.ReplaceAllString("-ab-axxb-", "$1W"))
line. The docs say:
Inside repl,
$
signs are interpreted as inExpand
And Expand says:
In the template, a variable is denoted by a substring of the form
$name
or${name}
, where name is a non-empty sequence of letters, digits, and underscores. A reference to an out of range or unmatched index or a name that is not present in the regular expression is replaced with an empty slice.In the
$name
form, name is taken to be as long as possible:$1x
is equivalent to${1x}
, not${1}x
, and,$10
is equivalent to${10}
, not${1}0
.
So, in the 3rd replacement, $1W
is treated as ${1W}
and since this group is not initialized, an empty string is used for replacement.
When I say "the group is not initialized", I mean to say that the group is not defined in the regex pattern, thus, it was not populated during the match operation. Replacing means getting all matches and then they are replaced with the replacement pattern. Backreferences ($xx
constructs) are populated during the matching phase. The $1W
group is missing in the pattern, thus, it was not populated during matching, and only an empty string is used when replacing phase occurs.
The 2nd and 4th replacements are easy to understand and have been described in the above answers. Just $1
backreferences the characters captured with the first capturing group (the subpattern enclosed with a pair of unescaped parentheses), same is with Example 4.
You can think of {}
as a means to disambiguate the replacement pattern.
Now, if you need to make the results consistent, use a named capture (?P<1W>....)
:
re := regexp.MustCompile("a(?P<1W>x*)b") // <= See here, pattern updated
fmt.Println(re.ReplaceAllString("-ab-axxb-", "T"))
fmt.Println(re.ReplaceAllString("-ab-axxb-", "$1"))
fmt.Println(re.ReplaceAllString("-ab-axxb-", "$1W"))
fmt.Println(re.ReplaceAllString("-ab-axxb-", "${1}W"))
Results:
-T-T-
--xx-
--xx-
-W-xxW-
The 2nd and 3rd lines now produce consistent output since the named group 1W
is also the first group, and $1
numbered backreference points to the same text captured with a named capture $1W
.
Upvotes: 7
Reputation: 42413
$1
is a shorthand for ${1}
${1}
is the value of the first (1) group, e.g. the content of the first pair of (). This group is (x*)
i.e. any number of x
.
ReplaceAllString
replaces every match. There are two matches. The first is ab
, the second is axxb
.
No 2. replaces any match with the content of the group: This is "" in the first match and "xx" in the second.
No 4. adds a "W" after the content of the group.
No 3. Is left as an exercise. Hint: The twelfth capturing group would be $12.
Upvotes: 3
Reputation: 2873
$number or $name is index of subgroup in regex or subgroup name
fmt.Println(re.ReplaceAllString("-ab-axxb-", "$1"))
$1 is subgroup 1 in regex = x*
fmt.Println(re.ReplaceAllString("-ab-axxb-", "$1W"))
$1W no subgroup name 1W => Replace all with null
fmt.Println(re.ReplaceAllString("-ab-axxb-", "${1}W"))
$1 and ${1} is the same. replace all subgroup 1 with W
for more information : https://golang.org/pkg/regexp/
Upvotes: 2