Andrew
Andrew

Reputation: 7873

Only 2 emoji return an incorrect length when compared against a character set containing them

let myString = "â˜ēī¸"

let emoji = "😀😁😂😃😄😅😆😇😈đŸ‘ŋ😉😊â˜ēī¸đŸ˜‹đŸ˜ŒđŸ˜đŸ˜ŽđŸ˜đŸ˜đŸ˜‘đŸ˜’đŸ˜“đŸ˜”đŸ˜•đŸ˜–đŸ˜—đŸ˜˜đŸ˜™đŸ˜šđŸ˜›đŸ˜œđŸ˜đŸ˜žđŸ˜ŸđŸ˜ đŸ˜ĄđŸ˜ĸđŸ˜Ŗ😤đŸ˜ĨđŸ˜Ļ😧😨😩đŸ˜ĒđŸ˜ĢđŸ˜ŦđŸ˜­đŸ˜ŽđŸ˜¯đŸ˜°đŸ˜ąđŸ˜˛đŸ˜ŗ😴đŸ˜ĩđŸ˜ļ😷🙂🙃🙄🤔🙁☹ī¸đŸ¤’đŸ¤•đŸ¤‘đŸ¤“đŸ¤—đŸ¤đŸ¤ đŸ¤¤đŸ¤Ĩ🤧đŸ¤ĸ🤡đŸ¤Ŗ"

let characterSet = CharacterSet(charactersIn: emoji)

let range = (myString as NSString).rangeOfCharacter(from: characterSet)
(myString as NSString).substring(with: range)
(range as NSRange).location
(range as NSRange).length
(myString as NSString).length

substring == myString

This code can be ran in Playgrounds. Try changing myString to be any emoji face.

I'm using NSString and NSRange here as their values are easier to demonstrate, but this has the exact same behaviour with a Swift String or Range.

When I set myString to most of the face emojis, the range comes back as having a length of 2, and the substring can be used appropriately elsewhere. With only 2 face emojis - the "smiling face" emoji and "frowning face" emoji, the range comes back as a length of 1. In all cases, the length of the string comes back as 2. The substring with the given range of 1 is incomplete, and you can see that comparing it back to myString, as an example of comparing it to itself, gives a result of false. The result for the range of those 2 emojis should be 2.

Interestingly, looking at the unicode spec, those 2 emojis have vastly differently unicode values to their neighbours.

This seems like it may be an iOS bug. I can't think of anything I could be personally doing incorrectly here, as it works with all other emoji.

Upvotes: 3

Views: 1211

Answers (1)

pbodsk
pbodsk

Reputation: 6876

Hardly an answer but to much to fit into a comment so bear with me :)

I don't know if you've already seen this but I think your problem is addressed in the Platform State of the Union talk from WWDC 2017 (https://developer.apple.com/videos/play/wwdc2017/102/) in the section about what is new in Swift 4.

If you look at the video at about the 23 minutes 12 seconds mark you'll see Ted Kremenek talk about how they've fixed separating unicode characters out as expected in Swift 4 using Unicode 9 Grapheme Braking.

Also, have a look at this question and answer.

Yes...Don't ask me in detail what all this means, but it seems as if they're working on it :)

Upvotes: 1

Related Questions