Reputation: 5871
As a simplified example, I want to get ^⬛+$
matched against ⬛⬛⬛
to yield a find match of ⬛⬛⬛
.
r := regexp.MustCompile("^⬛+$")
matches := r.FindString("⬛️⬛️⬛️")
fmt.Println(matches)
But it doesn't match successfully even though this would work with regular ASCII characters.
I'm guessing there's something I don't know about Unicode matching, but I haven't found any decent explanation in documentation yet.
Can someone explain the problem?
Upvotes: 1
Views: 2860
Reputation: 626896
You need to account for all chars in the string. If you analyze the string you will see it contains:
So you need a regex that will match a string containing one or more combinations of \x{2B1B}
and \x{FE0F}
chars till end of string.
So you need to use
^(?:\x{2B1B}\x{FE0F})+$
See the regex demo.
Note you can use \p{M}
to match any diacritic mark:
^(?:\x{2B1B}\p{M})+$
See the Go demo:
package main
import (
"fmt"
"regexp"
)
func main() {
r := regexp.MustCompile(`^(?:\x{2B1B}\x{FE0F})+$`)
matches := r.FindString("⬛️⬛️⬛️")
fmt.Println(matches)
}
Upvotes: 4
Reputation: 24
The regular expression matches a string containing one or more ⬛ (black square box).
The subject string is three pairs of black square box and variation selector-16. The variation selectors are invisible (on my terminal) and prevent a match.
Fix by removing the variation selectors from the subject string or adding the variation selector to the pattern.
Here's the first fix: https://go.dev/play/p/oKIVnkC7TZ1
Upvotes: 0