Reputation: 113
I'm trying to do a simple regex match using NSRegularExpression, but I'm having some problems matching the string when the source contains multibyte characters:
let string = "D 9"
// The following matches (any characters)(SPACE)(numbers)(any characters)
let pattern = "([\\s\\S]*) ([0-9]*)(.*)"
let slen : Int = string.lengthOfBytesUsingEncoding(NSUTF8StringEncoding)
var error: NSError? = nil
var regex = NSRegularExpression(pattern: pattern, options: NSRegularExpressionOptions.DotMatchesLineSeparators, error: &error)
var result = regex?.stringByReplacingMatchesInString(string, options: nil, range: NSRange(location:0,
length:slen), withTemplate: "First \"$1\" Second: \"$2\"")
The code above returns "D" and "9" as expected
If I now change the first line to include a UK 'Pound' currency symbol as follows:
let string = "£ 9"
Then the match doesn't work, even though the ([\\s\\S]*)
part of the expression should still match any leading characters.
I understand that the £
symbol will take two bytes but the wildcard leading match should ignore those shouldn't it?
Can anyone explain what is going on here please?
Upvotes: 8
Views: 2331
Reputation: 9061
I've run into this a couple times and Martin's answer helped me understand the problem. Here's a quick version of the solution that worked for me.
If your regular expression function includes a range parameter built like this:
NSRange(location: 0, length: yourString.count)
You can change it to this:
NSRange(location: 0, length: yourString.utf16.count)
Upvotes: 0
Reputation: 539805
It can be confusing. The first parameter of stringByReplacingMatchesInString()
is mapped from NSString
in
Objective-C to String
in Swift, but the range:
parameter is still
an NSRange
. Therefore you have to specify the range in the units
used by NSString
(which is the number of UTF-16 code points):
var result = regex?.stringByReplacingMatchesInString(string,
options: nil,
range: NSRange(location:0, length:(string as NSString).length),
withTemplate: "First \"$1\" Second: \"$2\"")
Alternatively you can use count(string.utf16)
instead of (string as NSString).length
.
Full example:
let string = "£ 9"
let pattern = "([\\s\\S]*) ([0-9]*)(.*)"
var error: NSError? = nil
let regex = NSRegularExpression(pattern: pattern,
options: NSRegularExpressionOptions.DotMatchesLineSeparators,
error: &error)!
let result = regex.stringByReplacingMatchesInString(string,
options: nil,
range: NSRange(location:0, length:(string as NSString).length),
withTemplate: "First \"$1\" Second: \"$2\"")
println(result)
// First "£" Second: "9"
Upvotes: 14