Vinoth Kumar
Vinoth Kumar

Reputation: 57

Using Unicodes in Swift Regular Expression

I am trying to match a regex pattern in a string in Swift. When I use the actual characters in regex pattern, it works as expected. However, I use Unicode versions of the same characters in regex, it does not work as expected. Could you please help me with what is wrong here. I need to use regex with Unicode.

Code:

var input = "一" // u{4E00}

extension String {
    var patternMatchesWithUnicode: Bool {
        //doesnt work
        return self.range(of: #"[\u{4E00}-\u{9FFF}]"#, options: .regularExpression) != nil
    }
    var patternMatchesWithString: Bool {
        //works
        return self.range(of: #"[一-鿿]"#, options: .regularExpression) != nil
    }
}

print(input.patternMatchesWithString)
print(input.patternMatchesWithUnicode)

Output:

false
true

Upvotes: 1

Views: 569

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626927

You can use

extension String {
    var patternMatchesWithUnicode: Bool {
        return self.range(of: #"[\u4E00-\u9FFF]"#, options: .regularExpression) != nil
    }
}

These will also work:

return self.range(of: #"[\x{4E00}-\x{9FFF}]"#, options: .regularExpression) != nil
return self.range(of: #"[\U00004E00-\U00009FFF]"#, options: .regularExpression) != nil

Swift regex flavor is ICU, see the excerpt from the docs page:

\uhhhh - Match the character with the hex value hhhh.
\Uhhhhhhhh - Match the character with the hex value hhhhhhhh. Exactly eight hex digits must be provided, even though the largest Unicode code point is \U0010ffff.
\x{hhhh} - Match the character with hex value hhhh. From one to six hex digits may be supplied.
\xhh - Match the character with two digit hex value hh.

Upvotes: 2

Related Questions