Reputation: 6952
When reading data from the Input file I noticed that the ¥ symbom was not being read by the StreamReader. Mozilla Firefox showed the input file type as Western (ISO-8859-1).
After playing around with the encoding parameters I found it worked successfully for the following values:
System.Text.Encoding.GetEncoding(1252) // (western iso 88591)
System.Text.Encoding.Default
System.Text.Encoding.UTF7
Now I am planning on using the "Default" setting, however I am not very sure if this is the right decision. The existing code did not use any encoding and I am worried I might break something.
I know very little (OR rather nothing) about encoding. How do I go about this? Is my decision to use System.Text.Encoding.Default safe? Should I be asking the user to save the files in a particular format ?
Upvotes: 2
Views: 5821
Reputation: 86492
Are you a software developer? do not forget to read Joel Spolsky's The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
Upvotes: 1
Reputation: 124794
The existing code did not use any encoding
It may not have explicitly specified the encoding, in which case the encoding probably defaulted to Encoding.UTF8.
The name Encoding.Default might give the impression that this is the default encoding used by classes such as StreamReader, but this is not the case: As Jon Skeet pointed out, Encoding.Default is the encoding for the operating system's current ANSI code page.
Personally I think this makes the property name Encoding.Default somewhat misleading.
Upvotes: 2
Reputation: 1503140
Code page 1252 isn't quite the same as ISO-Latin-1. If you want ISO-Latin-1, use Encoding.GetEncoding(28591)
. However, I'd expect them to be the same for this code point (U+00A5). UTF-7 is completely different (and almost never what you want to use).
Encoding.Default
is not safe - it's a really bad idea in most situations. It's specific to the particular computer you're running on. If you transfer a file from one computer to another, who knows what encoding the original computer was using?
If you know that your file is in ISO-8859-1, then explicitly use that. What's producing these files? If they're just being saved by the user, what program are they being saved in? If UTF-8 is an option, that's a good one - partly because it can cope with the whole of Unicode.
I have an article on Unicode and another on debugging Unicode issues which you may find useful.
Upvotes: 3