Reputation: 1039
StreamReader reads '–' (alt+ 0150) as � even if I have UTF-8 encoding and I have detectEncodingFromByteOrderMarks (BOM) set to true. Can any one guide me on this ?
Upvotes: 2
Views: 808
Reputation: 5733
As the other users wrote, the probable reason of this issue is an ANSI encoding of the file you are trying to read. I've recreated the issue you've described when I saved the file in ANSI encoding.
Try to use this code:
var stream = new StreamReader(fileName, Encoding.Default);
The Encoding.Default parameter is important in here. This code should read the character you've mentioned correctly.
Upvotes: 1
Reputation: 942518
That byte code won't appear in utf-8 encoded text. It is '\u2013', 0xe2 + 0x80 + 0x93 when encoded in utf-8. If you get this character when you type Alt+0150 on the numeric keypad then your default system code page is probably 1252. Simply pass Encoding.Default to the StreamReader constructor.
Upvotes: 4
Reputation: 63970
You need to know the encoding that was used to encode the text. There's no way around that. Try different encodings until you get the desired results.
From MSDN:
The detectEncodingFromByteOrderMarks parameter detects the encoding by looking at the first three bytes of the stream. It automatically recognizes UTF-8, little-endian Unicode, and big-endian Unicode text if the file starts with the appropriate byte order marks. Otherwise, the user-provided encoding is used. See the Encoding.GetPreamble method for more information.
Which means that using that BOM is just an extra thing that may or may not work or can be easily overriden
Upvotes: 2