Reputation: 1364
this one's puzzling me no end. I need to send an xml message to a Russian webservice. The XML has to be encoded in windows-1251
I have a number of objects that respond to the different types of messages and I turn them into xml thus:
public string Serialise(Type t, object o, XmlSerializerNamespaces Namespaces)
{
XmlSerializer serialiser = _serialisers.First(s => s.GetType().FullName.Contains(t.Name));
Windows1251StringWriter myWriter = new Windows1251StringWriter();
serialiser.Serialize(myWriter, o, Namespaces);
return myWriter.ToString();
}
public class Windows1251StringWriter : StringWriter
{
public override Encoding Encoding
{
get { return Encoding.GetEncoding(1251); }
}
}
which works fine but the web service rejects requests if we send any characters that aren't in windows-1251. In the latest example I tried to send a phone number with 'LEFT-TO-RIGHT EMBEDDING' (U+202A), 'NON-BREAKING HYPHEN' (U+2011) and god help us 'POP DIRECTIONAL FORMATTING' (U+202C). I have no control over the input. I'd like to turn any unknown characters into ? or remove them. I've tried messing with the EncoderFallback but it doesn't seem to change anything.
Am I going about this wrong?
Upvotes: 0
Views: 1250
Reputation: 116826
Since you are serializing to a string
, the only thing the Encoding
property in Windows1251StringWriter
does for you is to change the name of the encoding shown in the XML:
<?xml version="1.0" encoding="windows-1251"?>
(I think this trick comes from here.)
And that's it. All c# strings are always encoded in utf-16 and the base class StringWriter
writes to this encoding no matter what, regardless of whether the Encoding
property is overridden.
To strip away characters from your XML that are invalid in some specific encoding, you need to encode it down to a byte stream, then decode it, e.g. with the following:
public static class XmlSerializationHelper
{
public static string GetXml<X>(this X toSerialize, XmlSerializer serializer = null, XmlSerializerNamespaces namespaces = null, Encoding encoding = null)
{
if (toSerialize == null)
throw new ArgumentNullException();
encoding = encoding ?? Encoding.UTF8;
serializer = serializer ?? new XmlSerializer(toSerialize.GetType());
using (var stream = new MemoryStream())
using (var writer = new StreamWriter(stream, encoding))
{
serializer.Serialize(writer, toSerialize, namespaces);
writer.Flush();
stream.Position = 0;
using (var reader = new StreamReader(stream, encoding))
{
return reader.ReadToEnd();
}
}
}
}
Then do
var encoding = Encoding.GetEncoding(1251, new EncoderReplacementFallback(""), new DecoderExceptionFallback());
return o.GetXml(serialiser, Namespaces, encoding);
Upvotes: 2