Reputation: 2897

Writing c# source code to files

I'm having a stupid problem. I'm reading some .cs files from disk. Doing lots of regex and other operations on them with a .net program i've made. Then write them back to disc.

The resulting files get the wrong encoding somehow. What encoding are c# source files? And then there is the first byte-order thing, is that needed? Does it get written when i use File.WriteAllText()?

The program changing the files is a simple .net application, and the code is simply

string text = System.IO.File.ReadAllText(fn);
string newText = Regex.Replace(text, regexStr, replaceStr);
System.IO.File.WriteAllText(fn, newText);

The c# files have comments and strings don't seem to be part of the standard codepage.

One of the problematic characters is "ä"

Solution:

this seems to work correctly

string text = System.IO.File.ReadAllText(fn, Encoding.GetEncoding(1252));
string newText = Regex.Replace(text, regexStr, replaceStr);
System.IO.File.WriteAllText(fn, newText, Encoding.GetEncoding(1252));

Upvotes: 4

Answers (3)

David Schmitt

Reputation: 59346

System.IO.File.ReadAllText(fn) tries to guess the encoding of the input file. This can go horribly wrong.

Visual Studio 2008 creates files by default in UTF-8. Similarly you should try to use UTF-8 where ever possible, by specifying Encoding.UTF8Encoding when writing the files to disk.

Upvotes: 2

Fraser

Reputation: 17059

By default the files should be encoded with the same code page that is set in the regional settings of the machine. By default this will be 'Unicode (UTF-8 with signature) - Codepage 65001' you can use any code page you wish, for example you could also use 'Western European (windows) - Codepage 1252'.

Upvotes: 1

Mauro

Reputation: 4511

I've written a few code gens in my time and always used ASCII encoding (plain windows text). What language are you using to do the regex ops on the CS files?

Upvotes: 0

Writing c# source code to files

Answers (3)

Related Questions