Reputation: 4811
I have an xml sheet which contains some special character "& is the special character causing issues" and i use below code to deserialize XML
XMLDATAMODEL imported_data;
// Create an instance of the XmlSerializer specifying type and namespace.
XmlSerializer serializer = new XmlSerializer(typeof(XMLDATAMODEL));
// A FileStream is needed to read the XML document.
FileStream fs = new FileStream(path, FileMode.Open);
XmlReader reader = XmlReader.Create(fs);
// Use the Deserialize method to restore the object's state.
imported_data = (XMLDATAMODEL)serializer.Deserialize(reader);
fs.Close();
and structre of my XML MOdel is like this
[XmlRoot(ElementName = "XMLDATAMODEL")]
public class XMLDATAMODEL
{
[XmlElement(ElementName = "EventName")]
public string EventName { get; set; }
[XmlElement(ElementName = "Location")]
public string Location { get; set; }
}
I tried this code as well with Encoding mentioned but no success
// Declare an object variable of the type to be deserialized.
StreamReader streamReader = new StreamReader(path, System.Text.Encoding.UTF8, true);
XmlSerializer serializer = new XmlSerializer(typeof(XMLDATAMODEL));
imported_data = (XMLDATAMODEL)serializer.Deserialize(streamReader);
streamReader.Close();
Both approaches failed and if i put special character inside Cdata it looks working. How can i make it work for xml data without CData as well?
Here is my XML file content
And error i am getting is There is an error in XML document (2, 17).
Upvotes: 3
Views: 12588
Reputation: 30813
The best answer I could get after looking around is, unless you serialize the data yourself, it will be pretty trouble some to deserialize XML will special characters.
For your case, since the special character is &
before you can deserialize it, you should convert it to &
Unless the character &
is converted to &
we cannot really deserialize it with XmlSerializer. Yes, we still can read it by using
XmlReaderSettings settings = new XmlReaderSettings();
settings.CheckCharacters = false; //not to check false character, this setting can be set.
FileStream fs = new FileStream(xmlfolder + "\\xmltest.xml", FileMode.Open);
XmlReader reader = XmlReader.Create(fs, settings);
But we cannot deserialize it.
As how to convert &
to &
, there are various ways with plus and minus. But the bottom line in all conversion is, do not use stream directly. Just take the data from the file and convert it to string
by using, for example, File.ReadAllText
and start doing the string processing. After that, convert it to MemoryStream
and start the deserialization;
And now for the string processing before deserialization, there are couple of ways to do it.
The easiest, and most of the time could be the most unsafe, would be by using string.Replace("&", "&")
.
The other way, harder but safer, is by using Regex. Since your case is something inside CData
, this could be a good way too.
Another way harder yet safer, by creating your parsing for line by line.
I have yet to find what is the common, safe, way for this conversion.
But as for your example, the string.Replace
would work. Also, you could potentially exploit the pattern (something inside CData
) to use Regex. This could be a good way too.
Edit:
As for what are considered as special characters in XML and how to process them before hand, according to this, non-Roman characters are included.
Apart from the non-Roman characters, in here, there are 5 special characters listed:
< -> <
> -> >
" -> "
' -> '
& -> &
And from here, we get one more:
% -> %
Hope they can help you!
Upvotes: 9