rmaddy
rmaddy

Reputation: 318934

How to determine the display count of a Swift String?

I've reviewed questions such as Get the length of a String and Why are emoji characters like 👩‍👩‍👧‍👦 treated so strangely in Swift strings? but neither cover this specific question.

This all started when trying to apply skin tone modifiers to Emoji characters (see Add skin tone modifier to an emoji programmatically). This led to wondering what happens when you apply a skin tone modifier to a regular character such as "A".

Examples:

let tonedThumbsUp = "👍" + "🏻" // 👍🏻
let tonedA = "A" + "🏾" // A🏾

I'm trying to detect that second case. The count of both of those strings is 1. And the unicodeScalars.count is 2 for both.

How do I determine if the resulting string appears as a single character when displayed? In other words, how can I determine if the skin tone modifier was applied to make a single character or not?

I've tried a few ways to dump information about the string but none give the desired result.

func dumpString(_ str: String) {
    print("Raw:", str, str.count)
    print("Scalars:", str.unicodeScalars, str.unicodeScalars.count)
    print("UTF16:", str.utf16, str.utf16.count)
    print("UTF8:", str.utf8, str.utf16.count)
    print("Range:", str.startIndex, str.endIndex)
    print("First/Last:", str.first == str.last, str.first, str.last)
}

dumpString("A🏽")
dumpString("\u{1f469}\u{1f3fe}")

Results:

Raw: A🏽 1
Scalars: A🏽 2
UTF16: A🏽 3
UTF8: A🏽 3
First/Last: true Optional("A🏽") Optional("A🏽")
Raw: 👩🏾 1
Scalars: 👩🏾 2
UTF16: 👩🏾 4
UTF8: 👩🏾 4
First/Last: true Optional("👩🏾") Optional("👩🏾")

Upvotes: 3

Views: 542

Answers (2)

matt
matt

Reputation: 535964

I think it might be possible to reason about this by looking to see whether the modifier is present and if so whether it has increased the character count.

So for example:

let tonedThumbsUp = "👍" + "🏻"
let tonedA = "A" + "🏻"
tonedThumbsUp.count // 1
tonedThumbsUp.unicodeScalars.count // 2
tonedA.count //2
tonedThumbsUp.unicodeScalars.count //2
let c = "\u{1F3FB}"
tonedThumbsUp.contains(c) // true
tonedA.contains(c) // true

Okay, so they both contain a modifier character, and they both contain two unicode scalars, but one is count 1 and the other is count 2. Surely that's a useful distinction.

Upvotes: 1

rob mayoff
rob mayoff

Reputation: 385998

What happens if you print 👍🏻 on a system that doesn't support the Fitzpatrick modifiers? You get 👍 followed by whatever the system uses for an unknown character placeholder.

So I think to answer this, you must consult your system's typesetter. For Apple platforms, you can use Core Text to create a CTLine and then count the line's glyph runs. Example:

import Foundation
import CoreText

func test(_ string: String) {
    let richText = NSAttributedString(string: string)
    let line = CTLineCreateWithAttributedString(richText as CFAttributedString)
    let runs = CTLineGetGlyphRuns(line) as! [CTRun]
    print(string, runs.count)
}

test("👍" + "🏻")
test("A" + "🏾")
test("B\u{0300}\u{0301}\u{0302}" + "🏾")

Output from a macOS playground in Xcode 10.2.1 on macOS 10.14.6 Beta (18G48f):

👍🏻 1
A🏾 2
B̀́̂🏾 2

Upvotes: 3

Related Questions