Jack BeNimble
Jack BeNimble

Reputation: 36713

objective c - does not read utf-8 encoded file

I'm trying to display some japanese text on the ios simulator and an ipod touch. The text is read from an XML file. The header is:

<?xml version="1.0" encoding="utf-8"?>

When the text is in english, it displays fine. However, when the text is Japanese, it comes out as an unintelligible mishmash of single-byte characters.

I have tried saving the file specifically as unicode using TextEdit. I'm using NSXMLParser to parse the data. Any ideas would be much appreciated.

Here is the parsing code

   // Override point for customization after application launch.

    NSString *xmlFilePath = [[[NSBundle mainBundle] resourcePath] stringByAppendingPathComponent:@"questionsutf8.xml"];
    NSString *xmlFileContents = [NSString stringWithContentsOfFile:xmlFilePath];

    NSData *data = [NSData dataWithBytes:[xmlFileContents UTF8String] length:[xmlFileContents lengthOfBytesUsingEncoding: NSUTF8StringEncoding]];                   

    XMLReader *xmlReader = [[XMLReader alloc] init];

    [xmlReader parseXMLData: data];

Upvotes: 0

Views: 3191

Answers (2)

dreamlax
dreamlax

Reputation: 95355

stringWithContentsOfFile: is a deprecated method. It does not do encoding detection unless the file contains the appropriate byte order mark, otherwise it interprets the file as the default C string encoding (the encoding returned by the +defaultCStringEncoding method). Instead, you should use the non-deprecated [and encoding-detecting] method stringWithContentsOfFile:usedEncoding:error:.

You can use it like this:

NSStringEncoding enc;
NSError *error;
NSString *xmlFileContents = [NSString stringWithContentsOfFile:xmlFilePath
                                                  usedEncoding:&enc
                                                         error:&error];

if (xmlFileContents == nil)
{
    NSLog (@"%@", error);
    return;
}

Upvotes: 2

Sherm Pendley
Sherm Pendley

Reputation: 13622

First, you should verify with TextWrangler (free from the Mac app store or barebones.com) that your XML file truly is UTF-8 encoded.

Second, try creating xmlFileContents with +stringWithContentsOfFile:encoding:error:, explicitly specifying UTF-8 encoding. Or, even better, bypass the intermediate string entirely, and create data with +dataWithContentsOfFile:.

Upvotes: 1

Related Questions