Reputation: 10627
My source XML has the copyright character in it as ©
. When writing the XML with this code:
var stringWriter = new StringWriter();
segmentDoc.Save(stringWriter);
Console.WriteLine(stringWriter.ToString());
it is rendering that copyright character as a little "c" with a circle around it. I'd like to preserve the original code so it gets spit back out as ©
. How can I do this?
Update: I also noticed that the source declaration looks like <?xml version="1.0" encoding="utf-8"?>
but my saved output looks like <?xml version="1.0" encoding="utf-16"?>
. Can I indicate that I want the output to still be utf-8? Would that fix it?
Update2: Also,  
is getting output as ÿ
. I definitely don't want that happening!
Update3: §
is becoming a little box and that is wrong, too. It should be §
Upvotes: 2
Views: 4490
Reputation: 11
i had the same problem when saving some lithuanian characters in this way. i found a way to cheat around this by replacing &
with &
(&#x00A9;
to write ©
and so on) it looks strange but it worked for me :)
Upvotes: 1
Reputation: 25652
It appears that UTF8 won't solve the problem. The following has the same symptoms as your code:
MemoryStream ms = new MemoryStream();
XmlTextWriter writer = new XmlTextWriter(ms, new UTF8Encoding());
segmentDoc.Save(writer);
ms.Seek(0L, SeekOrigin.Begin);
var reader = new StreamReader(ms);
var result = reader.ReadToEnd();
Console.WriteLine(result);
I tried the same approach with ASCII, but wound up with ?
instead of ©.
I think using a string replace after converting the XML to a string is your best bet to get the effect you want. Of course, that could be cumbersome if you are interested in more than just the @copy; symbol.
result = result.Replace("©", "\u0026#x00A9;");
Upvotes: 0
Reputation: 1500525
I strongly suspect you won't be able to do this. Fundamentally, the copyright sign is ©
- they're different representations of the same thing, and I expect that the in-memory representation normalizes this.
What are you doing with the XML afterwards? Any sane application processing the resulting XML should be fine with it.
You may be able to persuade it to use the entity reference if you explicitly encode it with ASCII... but I'm not sure.
EDIT: You can definitely make it use a different encoding. You just need a StringWriter
which reports that its "native" encoding is UTF-8. Here's a simple class you can use for that:
public class Utf8StringWriter : StringWriter
{
public override Encoding Encoding
{
get { return Encoding.UTF8; }
}
}
You could try changing it to use Encoding.ASCII
as well and see what that does to the copyright sign...
Upvotes: 4
Reputation: 3436
Maybe you can try to diffent document encoding, check out: http://www.sagehill.net/docbookxsl/CharEncoding.html
Upvotes: 0