Reputation: 57
I am trying to match a regex pattern in a string in Swift. When I use the actual characters in regex pattern, it works as expected. However, I use Unicode versions of the same characters in regex, it does not work as expected. Could you please help me with what is wrong here. I need to use regex with Unicode.
Code:
var input = "一" // u{4E00}
extension String {
var patternMatchesWithUnicode: Bool {
//doesnt work
return self.range(of: #"[\u{4E00}-\u{9FFF}]"#, options: .regularExpression) != nil
}
var patternMatchesWithString: Bool {
//works
return self.range(of: #"[一-鿿]"#, options: .regularExpression) != nil
}
}
print(input.patternMatchesWithString)
print(input.patternMatchesWithUnicode)
Output:
false
true
Upvotes: 1
Views: 569
Reputation: 626927
You can use
extension String {
var patternMatchesWithUnicode: Bool {
return self.range(of: #"[\u4E00-\u9FFF]"#, options: .regularExpression) != nil
}
}
These will also work:
return self.range(of: #"[\x{4E00}-\x{9FFF}]"#, options: .regularExpression) != nil
return self.range(of: #"[\U00004E00-\U00009FFF]"#, options: .regularExpression) != nil
Swift regex flavor is ICU, see the excerpt from the docs page:
\uhhhh
- Match the character with the hex valuehhhh
.
\Uhhhhhhhh
- Match the character with the hex valuehhhhhhhh
. Exactly eight hex digits must be provided, even though the largest Unicode code point is\U0010ffff
.
\x{hhhh}
- Match the character with hex valuehhhh
. From one to six hex digits may be supplied.
\xhh
- Match the character with two digit hex valuehh
.
Upvotes: 2