Reputation: 16246
I have noticed that if I try to print the byte array containing the representation of a string in UTF-8, using the format specifier "%s", printf()
gets it right but NSLog()
gets it garbled (i.e., each byte printed as-is, so for example "¥" gets printed as the 2 characters: "¬•").
This is curious, because I always thought that NSLog()
is just printf()
, plus:
My code:
NSString* string;
// (...fill string with unicode string...)
const char* stringBytes = [string cStringUsingEncoding:NSUTF8Encoding];
NSUInteger stringByteLength = [string lengthOfBytesUsingEncoding:NSUTF8Encoding];
stringByteLength += 1; // add room for '\0' terminator
char* buffer = calloc(sizeof(char), stringByteLength);
memcpy(buffer, stringBytes, stringByteLength);
NSLog(@"Buffer after copy: %s", buffer);
// (renders ascii, no matter what)
printf("Buffer after copy: %s\n", buffer);
// (renders correctly, e.g. japanese text)
Somehow, it looks as if printf()
is "smarter" than NSLog()
. Does anyone know the underlying cause, and if this feature is documented anywhere? (Couldn't find)
Upvotes: 3
Views: 5666
Reputation: 539685
NSLog()
and stringWithFormat:
seem to expect the string for %s
in the "system encoding" (for example "Mac Roman" on my computer):
NSString *string = @"¥";
NSStringEncoding enc = CFStringConvertEncodingToNSStringEncoding(CFStringGetSystemEncoding());
const char* stringBytes = [string cStringUsingEncoding:enc];
NSString *log = [NSString stringWithFormat:@"%s", stringBytes];
NSLog(@"%@", log);
// Output: ¥
Of course this will fail if some characters are not representable in the system encoding. I could not find an official documentation for this behavior, but one can see that using %s
in stringWithFormat:
or NSLog()
does not reliably work with arbitrary UTF-8 strings.
If you want to check the contents of a char
buffer containing an UTF-8 string, then
this would work with arbitrary characters (using the boxed expression syntax to create an NSString
from a UTF-8 string):
NSLog(@"%@", @(utf8Buffer));
Upvotes: 3