Reputation: 179
I'm working on an application in C# which pulls user data from Active Directory (using DirectorySearcher) and posts them to a remote site using a REST API. But some names contain special characters such as ØÆÅ etc., and I can't figure out how to encode them properly. The API expects to receive them encoded as &230; etc.
The following is a test stub:
using System;
using System.Collections.Generic;
using System.Text;
using System.Xml;
using System.IO;
namespace Encodingtest
{
class Program
{
static void Main(string[] args)
{
XmlWriterSettings xws = new XmlWriterSettings();
xws.Encoding = Encoding.UTF8;
StringWriter sw = new StringWriter();
using (XmlWriter xw = XmlWriter.Create(sw, xws))
{
xw.WriteStartElement("test");
xw.WriteElementString("element", "test øæåØÆÅ");
xw.WriteEndElement();
xw.Flush();
xw.Close();
}
Console.WriteLine(sw.ToString());
Console.ReadLine();
}
}
}
The problem is that the output is still in the same format as the input. That is, readable danish characters and not their numeric entity.
The REST API is a Rails app btw. I assume that any data in the C# app is unicode by default.
Any help and hits are greatly appreciated.
Cheers
Upvotes: 2
Views: 3767
Reputation: 66783
Any system processing XML should be able to handle UTF-8 character sets, especially if the encoding is explicitly declared as UTF-8. Those characters should not have to be encoded as numeric entity references.
If you want to ensure that those characters are serialized with numeric entities, then set your encoding to a smaller character set, like ascii
or us-ascii
.
In your code, change: xws.Encoding = Encoding.UTF8;
to: xws.Encoding = Encoding.ASCII;
Since those characters are outside of the ascii
character-set, they will be serialized as numeric character entities.
Upvotes: 1
Reputation: 30205
Perhaps just resort to your own "numeric character reference" generator:
foreach (char c in "test øæåØÆÅ")
{
string encoding = (int)c >= 0x80 ? String.Format("&{0};",(int)c) : c.ToString();
Console.Write(encoding);
}
The above code produces the output "test øæåØÆÅ"
which matches that found with an online converter.
Upvotes: 0