Marc Fearby
Marc Fearby

Reputation: 1384

NSRegularExpression anomaly with string containing accented "é" character

I'm using the stringByReplacingMatchesInString method of NSRegularExpression to separate input strings into parts so that I can rearrange them. This was working well until I tested it against a string containing an accented "é".

Here's an XCode playground demonstrating the problem. In this cut down example (it's not very "real world" but it does demonstrate the problem), I'm matching everything then creating a new string using a template which simply repeats those matches: "$1 - $1".

import Cocoa

var err: NSError?
var regex = NSRegularExpression(pattern: "^(.*?)$", options: nil, error: &err)

let test = "homér simpson"
let r = NSMakeRange(0, count(test))

var str = regex!.stringByReplacingMatchesInString(test, options: nil, range: r, withTemplate: "$1 - $1")

The string "str" ends up being "homér simpso - homér simpson". As you can see, the first instance of $1 is truncated by 1 character, and I've found that this is because of the accented "é". If you edit it to use a plain "e", it's fine.

But here's the weird thing. If you edit it again to put the accented "é" back in the string, it behaves like it should and doesn't truncate.

I'm inclined to suspect the range passed to the method, but I thought that count() was smart enough to handle the presence of unicode characters?

Upvotes: 1

Views: 138

Answers (1)

Marc Fearby
Marc Fearby

Reputation: 1384

I think I've solved it by using this for the range:

let r = NSMakeRange(0, count(test.utf16))

Not entirely sure why the utf16 is necessary, but I can't argue with the result.

Upvotes: 1

Related Questions