Reputation: 181
I'm trying to show content of text file with unknown encoding according to Apple's documentation:
Try stringWithContentsOfFile:usedEncoding:error: or initWithContentsOfFile:usedEncoding:error: (or the URL-based equivalents). These methods try to determine the encoding of the resource, and if successful return by reference the encoding used.
If (1) fails, try to read the resource by specifying UTF-8 as the encoding.
If (2) fails, try an appropriate legacy encoding. "Appropriate" here depends a bit on circumstances; it might be the default C string encoding, it might be ISO or Windows Latin 1, or something else, depending on where your data is coming from.
This is not always working. Is there more reliable ways to detect encoding?
Upvotes: 3
Views: 625
Reputation: 21
Here is an answer for Swift (may be not relevant 10 years ago). Trying to file a question like this myself using Swift, I came over this very old thread when Stack Overflow questioned me whether my question is a duplicate. In fact the answer from Denis is still ok.
I was trying
string = try String(contentsOf: url, encoding: .utf8)
but it just returns an error message like: The file ... could not be opened using text encoding „Unicode (UTF-8)“ and the string remains empty. So Denis answer in Swift would be something like:
// Open the file independently of its encoding!
var options: [NSAttributedString.DocumentReadingOptionKey : Any] = [:]
var dict: NSDictionary? = [:]
do {
let myString = try NSAttributedString(
url: url
, options: options
, documentAttributes: &dict
)
let encoding = String.Encoding(
rawValue: (dict?.value(forKey: "CharacterEncoding"))! as! UInt
)
string = myString.string
opened = true
}
catch {
Logger.write("\(error)")
string = ""
let alert = NSAlert()
alert.alertStyle = .critical
alert.messageText = """
File \
\(url.absoluteString) \
could not be loaded
"""
alert.informativeText = "\(error.localizedDescription)"
_ = alert.runModal()
}
The "NSError *error;" handling is respected by the catch statement.
Just be aware, that character codes larger than 127 might still be wrongly coded, but at least the text can be opened. Therefore also Remy Lebeau is correct!
In another thread it was proposed to ask the user for the encoding. This very likely fails, as even me I would not know what to answer for an unknown text file. Even worse encodings like CP437 (DOS German) can't be correctly converted by Apple's API without using an appropriate table for the high byte codes. Here Visual Studio Code is doing a very good job as it recognizes any encoding correctly (at least what I tested) and allows to convert such files to UTF-8.
Comment: Apple's header files for NSAttributedString in Swift are not converted correctly as they state only the deprecations in terms of Objective C.
Upvotes: 0
Reputation: 795
You should use NSAttributedString which can detect encoding. After long time testing different solutions, I use that:
NSError *error;
NSDictionary *options = [NSDictionary dictionary];
NSDictionary *attributes;
NSAttributedString *theString = [[NSAttributedString alloc] initWithURL:fileURL options:options documentAttributes:&attributes error:&error];
NSInteger detectedEncoding = [[attributes objectForKey:@"CharacterEncoding"] integerValue];
I tested many files from many sources/environment, and it seem to be efficient (thus you should check whether error
is nil
or not). For a plain csv file exported from Excel, I get this attributes dictionary (30 value means NSMacOSRomanStringEncoding
:
{
CharacterEncoding = 30;
DocumentType = NSPlainText;
UTI = "public.plain-text";
}
Upvotes: 1
Reputation: 597215
If you do not know the encoding of the data ahead of time, then it has to be guessed through analysis of the raw data, and that can sometimes lead to wrong guesses and thus unreliable decoding. When in doubt, just ask the user which encoding to use.
Upvotes: 0