anon_945500
anon_945500

Reputation: 269

Parsing complex xml CDATA in c# in windows 8 phone app

I am trying to read to some data in XML format which is CDATA in my windows 8 phone app. Here is a sample of the data:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE HolyQuran [
<!ATTLIST HolyQuran TranslationID CDATA #REQUIRED>
<!ATTLIST HolyQuran Writer CDATA #REQUIRED>
<!ATTLIST HolyQuran Language CDATA #REQUIRED>
<!ATTLIST HolyQuran LanguageIsoCode CDATA #REQUIRED>
<!ATTLIST HolyQuran Direction (rtl|ltr) #REQUIRED>
<!ELEMENT HolyQuran (Chapter+)>
<!ATTLIST Chapter ChapterID CDATA #REQUIRED>
<!ATTLIST Chapter ChapterName CDATA #REQUIRED>
<!ELEMENT Chapter (Verse+)>
<!ATTLIST Verse VerseID CDATA #REQUIRED>
<!ELEMENT Verse (#PCDATA)>
  ]>
<!-- This SQL Query Generated at 22 November 2013 01:44 (UTC) from
  www.qurandatabase.org -->
<HolyQuran TranslationID="59" Writer="Yusuf Ali" Language="English"
    LanguageIsoCode="eng" Direction="ltr">
<Chapter ChapterID="1" ChapterName="The Opening">
    <Verse VerseID="1"><![CDATA[In the name of Allah, Most Gracious, Most
                          Merciful.]]></Verse>
    <Verse VerseID="2"><![CDATA[Praise be to Allah, the Cherisher and Sustainer
                          of the worlds;]]></Verse>
    <Verse VerseID="3"><![CDATA[Most Gracious, Most Merciful;]]></Verse>
    <Verse VerseID="4"><![CDATA[Master of the Day of Judgment.]]></Verse>
    <Verse VerseID="5"><![CDATA[Thee do we worship, and Thine aid we seek.
                         ]]></Verse>
    <Verse VerseID="6"><![CDATA[Show us the straight way,]]></Verse>
    <Verse VerseID="7"><![CDATA[The way of those on whom Thou hast bestowed Thy
                         Grace, those whose (portion) is not wrath, and who go
                         not astray.]]></Verse>
</Chapter>
<Chapter ChapterID="114" ChapterName="The Men">
<Verse VerseID="1"><![CDATA[Say: I seek refuge with the Lord and Cherisher 
             of Mankind,]]></Verse>
<Verse VerseID="2"><![CDATA[The King (or Ruler) of Mankind,]]></Verse>
<Verse VerseID="3"><![CDATA[The god (or judge) of Mankind,-]]></Verse>
<Verse VerseID="4"><![CDATA[From the mischief of the Whisperer (of Evil), who 
                         withdraws (after his whisper),-]]></Verse>
<Verse VerseID="5"><![CDATA[(The same) who whispers into the hearts of Mankind,-]]>  
    </Verse>
<Verse VerseID="6"><![CDATA[Among Jinns and among men.]]></Verse>
</Chapter>
</HolyQuran>

I want to get a data structure which contains the the whole book with sub data structures for chapters to contain ChapterName, ChapterID and a List of all the verse contents and their corresponding VerseIDs for that particular chapter. Please note that by verse content, I mean the CDATA. I need to use XDocument but I cannot figure out how to parse this complex XML.

I will greatly appreciate any help!

Thanks!

Upvotes: 0

Views: 211

Answers (1)

Thomas Levesque
Thomas Levesque

Reputation: 292565

The easiest way is to use XML serialization: define classes that match the structure of the XML document, with attributes that describe the XML schema, and use the XmlSerializer class to parse the input.

In your case the classes would look like that:

public class HolyQuran
{
    [XmlAttribute]
    public int TranslationID { get; set; }
    [XmlAttribute]
    public string Writer { get; set; }
    [XmlAttribute]
    public string Language { get; set; }
    [XmlAttribute]
    public string LangIsoCode { get; set; }
    [XmlAttribute]
    public string Direction { get; set; }
    [XmlElement("Chapter")]
    public List<Chapter> Chapters { get; set; }
}

public class Chapter
{
    [XmlAttribute]
    public int ChapterID { get; set; }
    [XmlAttribute]
    public string ChapterName { get; set; }
    [XmlElement("Verse")]
    public List<Verse> Verses { get; set; }
}

public class Verse
{
    [XmlAttribute]
    public int VerseId { get; set; }
    [XmlText]
    public string Text { get; set; }
}

And you can use the following code to parse the file:

static HolyQuran LoadQuran(string path)
{
    var readerSettings = new XmlReaderSettings { DtdProcessing = DtdProcessing.Ignore };
    using (var reader = XmlReader.Create(path, readerSettings))
    {
        var xs = new XmlSerializer(typeof(HolyQuran));
        return (HolyQuran)xs.Deserialize(reader);
    }
}

You don't have to do anything special to parse the CDATA sections, the XmlReader already knows how to handle them.

Upvotes: 1

Related Questions