Reputation: 20376
I am trying to obtain the hex codepoint for emojis.
The code below successfully returns the hex codepoint for emojis without surrogate pairs (e.g. 1f58d for 🖍️):
NSData *data = [@"🖍️" dataUsingEncoding:NSUTF32LittleEndianStringEncoding];
uint32_t unicode;
[data getBytes:&unicode length:sizeof(unicode)];
NSLog(@"%x", unicode);
However, for emojis like "🤲🏾" which has codepoint "1f932-1f3ff", the method above only returns the first point, "1f932". How can I get the full hex codepoint for emojis with multiple code points please (any code approach is fine)? (Note that certain emojis, like "🚣♀️" has up to 5 code points e.g. 🚣♀️)
Upvotes: 0
Views: 411
Reputation: 331
- (NSArray<NSNumber*>*) unicodeCodePoints:(NSString*)unicodeChar
{
NSMutableArray* codePoints = [[NSMutableArray alloc] init];
NSData* data = [unicodeChar dataUsingEncoding:NSUTF32LittleEndianStringEncoding];
for ( NSUInteger i = 0; i < data.length / sizeof(UInt32); i++ )
{
UInt32* arr = (UInt32*)(data.bytes);
[codePoints addObject:@(arr[i])];
}
return codePoints;
}
Then you could call it like this:
for ( NSNumber* num in [self unicodeCodePoints:@"🚣♀️"] )
{
NSLog(@"%0*x", (int)(2*sizeof(UInt32)), (UInt32)[num unsignedIntegerValue]);
}
Please note this assumes a single unicode character is represented by the NSString argument.
Upvotes: 1
Reputation: 10137
You need to change uint32_t
to uint64_t
.
NSData *data = [@"🤲🏾" dataUsingEncoding:NSUTF32LittleEndianStringEncoding];
uint64_t unicode;
[data getBytes:&unicode length:sizeof(unicode)];
NSLog(@"%llx", unicode);
Upvotes: 2