Reputation: 2897
I'm having a stupid problem. I'm reading some .cs files from disk. Doing lots of regex and other operations on them with a .net program i've made. Then write them back to disc.
The resulting files get the wrong encoding somehow. What encoding are c# source files? And then there is the first byte-order thing, is that needed? Does it get written when i use File.WriteAllText()?
The program changing the files is a simple .net application, and the code is simply
string text = System.IO.File.ReadAllText(fn);
string newText = Regex.Replace(text, regexStr, replaceStr);
System.IO.File.WriteAllText(fn, newText);
The c# files have comments and strings don't seem to be part of the standard codepage.
One of the problematic characters is "ä"
Solution:
this seems to work correctly
string text = System.IO.File.ReadAllText(fn, Encoding.GetEncoding(1252));
string newText = Regex.Replace(text, regexStr, replaceStr);
System.IO.File.WriteAllText(fn, newText, Encoding.GetEncoding(1252));
Upvotes: 4
Views: 3041
Reputation: 59346
System.IO.File.ReadAllText(fn)
tries to guess the encoding of the input file. This can go horribly wrong.
Visual Studio 2008 creates files by default in UTF-8. Similarly you should try to use UTF-8 where ever possible, by specifying Encoding.UTF8Encoding
when writing the files to disk.
Upvotes: 2
Reputation: 17059
By default the files should be encoded with the same code page that is set in the regional settings of the machine. By default this will be 'Unicode (UTF-8 with signature) - Codepage 65001' you can use any code page you wish, for example you could also use 'Western European (windows) - Codepage 1252'.
Upvotes: 1
Reputation: 4511
I've written a few code gens in my time and always used ASCII encoding (plain windows text). What language are you using to do the regex ops on the CS files?
Upvotes: 0