Adnand
Adnand

Reputation: 562

XML Unicode deserialization

I have an XML file as following:

<?xml version="1.0" encoding="UTF-8"?>
<students xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
     <student name="Adnand"/>
     <student name="özil"/>
     <student name="ärnold"/>
</students>

As you see, I have an UTF-8 encoding, but I have used some non UTF-8 characters (ö, ä).

I use the following code to deserialize this XML:

public void readXML(string path)
{
    XmlSerializer deserializer = new XmlSerializer(typeof(Students));
    TextReader reader = new StreamReader(path);       
    object obj = deserializer.Deserialize(reader);
    Students myStudents = (Students)obj;
}

The deserialization process it's ok, but the special characters are shown as � symbol. I tryed changing the encoding type, but nothing. Can someone help me what alternatives I have?

ANSWER You should specify the Encoding.Default like

public void readXML(string path)
{
    XmlSerializer deserializer = new XmlSerializer(typeof(Students));
    TextReader reader = new StreamReader(path, Encoding.Default);       
    object obj = deserializer.Deserialize(reader);
    Students myStudents = (Students)obj;
}

Upvotes: 2

Views: 2567

Answers (3)

Freggar
Freggar

Reputation: 1056

It seems your file is not encoded as UTF-8 but as Window's default ANSI encoding.

Defining the StreamReader as

TextReader reader = new StreamReader(path, Encoding.Default)

should do the trick.


Note that this is more of a workaround and using Encoding.Default is actually a very bad idea since it will break when using another Culture. This article gives a nice overview why you should not use Encoding.Default (thanks to Alexander for sharing). It's better to use UTF-8 as most systems can deal with it.

In your specific case to actually save the file as UTF-8 you either have to:

  • Adapt the program that creates the file to output it as UTF-8

  • Or if you used a text editor to create the file, use a text editor that supports UTF-8 encoding (e.g. Notepad++).

Upvotes: 3

Marco Salerno
Marco Salerno

Reputation: 5203

This works for me:

class Program
{
    static void Main(string[] args)
    {
        List<Student> students = new List<Student>();
        XDocument xDocument = XDocument.Load("icsemmelle.xml");
        List<XElement> xStudents = xDocument.Descendants("student").ToList();
        foreach(XElement xStudent in xStudents)
        {
            students.Add(new Student { Name = xStudent.Attribute("name").Value });
        }
    }
}

class Student
{
    public string Name { get; set; }
}

Upvotes: 0

Werme
Werme

Reputation: 9

You can use StreamReader to specify encoding

var Students xmlObject = null;
using (var streamReader = new StreamReader(inXML, Encoding.UTF8, true)) {
    var xmlSerializer = new XmlSerializer(typeof(Students));
    xmlObject = (Students)xmlSerializer.Deserialize(streamReader);
}

Also have you tried using the Encoding "ISO-8859-1", I use this mostly for foreign characters

Upvotes: 0

Related Questions