iOS write to CSV file: which encoding to use

Question

In my iOS app, I have a feature that writes data to a CSV file. This works fine in most cases with the following:

[csvString writeToFile: filePath atomically:YES encoding: NSUTF8StringEncoding error:&error];

I recently got an email from a Japanese user that the CSV file exported has weird symbols instead of Japanese characters. So I switched to using NSUTF16StringEncoding and it seems to work fine for Japanese characters as well.

So the question is: is it better to use NSUTF16StringEncoding, or are there any drawbacks to doing this? It seems that other examples I've seen for writing to CSV files (including CHCSVParser) use NSUTF8StringEncoding, so I'm not sure which one to prefer.

Thanks.

ItalyPaleAle · Accepted Answer

There's no a "better" encoding.

UTF-8 uses a variable number of bytes per each character, from 1 to 4. UTF-16 uses always 2 bytes for every character. What is best, is really up to you and your business. In theory, if your users are mostly based in Asia and use primarily non-ASCII character, files encoded in UTF-16 are smaller. If your users are primarily living in the Western world and use Latin-based alphabets, using UTF-8 makes every file 50% smaller.

I believe your problem is not with the choice of the encoding, but rather with the presentation. Text editors cannot guess the encoding of a file, so it's possible that your Japanese user was using a text editor that defaults to UTF-16, and thus was unable to represent UTF-8 character sequences correctly. The solution to this problem is to using the BOM sequence, as per this SO answer: https://stackoverflow.com/a/2585194/192024 (in short: just add those 3 bytes at the beginning of the file to tell editors what encoding to use)

iOS write to CSV file: which encoding to use

Answers (1)

Related Questions