Reputation: 11
I have a very simple c# console app that reads through a text file and outputs the same file but with a particular string replaced on each line that it appears - utilizing StreamReader and StreamWriter. I do not know the encoding of the source file. I have encountered a situation where there is a character in the file (ext ascii dec 166, broken pipe) that when running through this app gets "mangled" using the default encoding (In the output file it ends up as a "box" character). Since I do not know the source file encoding I have attempted multiple options to see what would provide an unaltered result and oddly the only way that works is having it read in UTF-7 and written in UTF-8.
UTF-7 to UTF-7 causes problems like & to change to +AC. UTF-8 to UTF-8 (which I believe is the default) converts the character in question to the "box". ASCII to ASCII turns it into ?. Unicode to Unicode results in gibberish. Shouldn't it be same encoding read and write for same results? Simplified code example below:
using (var fileStream = new FileStream(fileName, FileMode.Open))
using (var fileReader = new StreamReader(fileStream,Encoding.UTF7))
using (var fileStreamOut = new FileStream(tempFileName,FileMode.Create))
using (var fileWriter = new StreamWriter(fileStreamOut,Encoding.UTF8))
{
while (!fileReader.EndOfStream)
{
var inputLine = fileReader.ReadLine();
if (inputLine != null)
{
inputLine = inputLine.Substring(0, 3) + newRdfi + inputLine.Substring(12);
fileWriter.WriteLine(inputLine);
}
}
fileWriter.Flush();
}
Upvotes: 0
Views: 1253
Reputation: 11
After clarification on the file creation method received from the developer of the source system and knowledge of the server it is being produced on I came to the conclusion the encoding was Windows-1252. Changing my read and write streams to use Encoding.GetEncoding(1252) resulted in all characters reading and outputting as expected.
Upvotes: 1