Isaiah Nelson
Isaiah Nelson

Reputation: 2490

Extracting inner value on CDATA with Linq to XML using a filter

I am using this code to retrieve the values I want from XML:

IEnumerable<ForewordReview> reviews = null;
try
{
    reviews = from item in xmlDoc.Descendants("node")
              select new ForewordReview()
              {
                  PubDate = item.Element("created").ToString(),
                  Isbn = item.Element("isbn").ToString(),
                  Summary = item.Element("review").ToString()
              };
} // ...

Incidentally, a client is now passing us almost every tag with CDATA which I need to extract:

<review>
    <node>
        <created>
            <![CDATA[2012-01-23 12:40:57]]>
        </created>
        <isbn>
            <![CDATA[123456789]]>
        </isbn>
        <summary>
            <![CDATA[Teh Kittehs like to play in teh mud]]>
        </summary>
    </node>
</review>

I have seen a couple of solutions for extracting these values from within the CDATA tag, one of which is to use a where clause on the LINQ statement:

where element.NodeType == System.Xml.XmlNodeType.CDATA

I sort of see whats going on here, but I am not sure this works with how I am using Linq (specifically, building an object from selected items.

Do I need to apply this filter on the items in the select statement individually? Otherwise, I dont really understand how this will work with the code I am using.

As always, I appreciate the help.

Upvotes: 1

Views: 3180

Answers (2)

Jon Hanna
Jon Hanna

Reputation: 113222

Remember that there is no difference between the meaning of:

<a>
 <b>Hello</b>
 <c>&amp; hello again</c>
</a>

and of

<a>
 <b><![CDATA[Hello]]></b>
 <c><![CDATA[& hello again]]></c>
</a>

Since you're calling ToString() and getting the entire content back - opening and closing tags, entity references, etc. still in XML form, then you must be prepared to deal with it in XML form. If not, the problem isn't with the code you show here, it's with the code that was okay with PubDate being "<created>2012-01-23 12:40:57</created>" and now isn't okay with it being the exactly equivalent "";

Either change that code to really parse the XML (for which the framework offers lots of things to help), or change it to take the date on its own and use Element("created").Value to retrieve it.

Upvotes: 1

user7116
user7116

Reputation: 64068

Cast each XElement to a string instead:

reviews = from item in xmlDoc.Descendants("node")
          select new 
          {
              PubDate = (string)item.Element("created"),
              Isbn = (string)item.Element("isbn"),
              Summary = (string)item.Element("summary")
          };
// Output:
// {
//      PubDate = 2012-01-23 12:40:57,
//      Isbn = 123456789,
//      Summary = Teh Kittehs like to play in teh mud
// }

This also works with other data types, such as int, float, DateTime, etc:

reviews = from item in xmlDoc.Descendants("node")
          select new 
          {
              PubDate = (DateTime)item.Element("created")
          };
// Output:
// {
//      PubDate = 1/23/2012 12:40:57
// }

It also works with XAttributes as well.

Upvotes: 8

Related Questions