Reputation: 5082
The following two code practically did the same thing
for character in "Dog!đ¶".characters {
print(character)
}
for character in "Dog!đ¶".unicodeScalars {
print(character)
}
However, when I check for more detail information behind the sense, I found the difference. The characters
property is the type of CharacterView
while unicodeScalars
is the type of UnicodeScalarView
.
Question:
What's the difference between them?
Which property is preferred for what situation? (would be nice to have an example)
Many Thanks
Upvotes: 1
Views: 720
Reputation: 63271
This comes down to the difference between a Character
and UnicodeScalar
.
Unicode Scalars
Behind the scenes, Swiftâs native String type is built from Unicode scalar values. A Unicode scalar is a unique 21-bit number for a character or modifier, such as U+0061 for LATIN SMALL LETTER A ("a"), or U+1F425 for FRONT-FACING BABY CHICK ("đ„").
...
Extended Grapheme Clusters
Every instance of Swiftâs Character type represents a single extended grapheme cluster. An extended grapheme cluster is a sequence of one or more Unicode scalars that (when combined) produce a single human-readable character.
Hereâs an example. The letter Ă© can be represented as the single Unicode scalar Ă© (LATIN SMALL LETTER E WITH ACUTE, or U+00E9). However, the same letter can also be represented as a pair of scalarsâa standard letter e (LATIN SMALL LETTER E, or U+0065), followed by the COMBINING ACUTE ACCENT scalar (U+0301). The COMBINING ACUTE ACCENT scalar is graphically applied to the scalar that precedes it, turning an e into an Ă© when it is rendered by a Unicode-aware text-rendering system.
From the Strings and Characters section of the Swift Programming Language Guide.
In most cases that I can think of you'll want to be dealing with Character
instances, as they as the smallest unit of Human language. I can't imagine a situation where you'd want to operate on a modifier without considering the full extended grapheme cluster.
Upvotes: 2