user1732337
user1732337

Reputation: 23

linq xml extract text between tags

I'm using Linq query to extract data from Xml file. I'm trying to extract data from tags and extra data, i.e. text, preserving the order. In other words giving the following xml excerpt:

<item>
 <elementA id="1" value="aaaa">zazaz
 <elementB id="2" value="bbbb">wwwww
 <elementC id="3" value="cccc">sssss
</item>

And I'm using the following statements to extract:

XElement root = XElement.Parse(@"
   <item>
         <elementA id="1" value="aaaa"/>zazaz
         <elementB id="2" value="bbbb"/>wwwww
         <elementC id="3" value="cccc"/>sssss
   </item>");
var nav = root.Descendants();
StringBuilder content=new StringBuilder();
foreach (var x in nav)
{
  content.Append(x.Name.LocalName)
       .Append(": id=")
       .Append(x.Attribute("id").Value)
       .Append(": value=")
       .Append(x.Attribute("value").Value)
       .Append(" extra data= ")
       .Append(x.Value)
       .Append("\n");
 }
 Console.WriteLine(content.ToString());

and it extracts:

elementA: id=1: value=aaaa extra data: 
elementB: id=2: value=bbbb extra data: 
elementC: id=3: value=cccc extra data: 

instead of:

elementA: id=1: value=aaaa extra data: zazaz
elementB: id=2: value=bbbb extra data: wwwww
elementC: id=3: value=cccc extra data: sssss

So, with ".Value" the text between tags is not extracted. Are there any tricks to perform it?

Upvotes: 0

Views: 879

Answers (2)

Alexander Petrov
Alexander Petrov

Reputation: 14251

Use Nodes() and check NodeType.

foreach (XNode node in root.Nodes())
{
    if (node.NodeType == XmlNodeType.Element)
    {
        XElement elem = (XElement)node;

        content.Append(elem.Name.LocalName)
            .Append(": id=")
            .Append(elem.Attribute("id").Value)
            .Append(": value=")
            .Append(elem.Attribute("value").Value);
    }
    else if (node.NodeType == XmlNodeType.Text)
    {
        XText text = (XText)node;

        content.Append(" extra data= ")
            .Append(text.Value.Trim())
            .AppendLine();
    }
}

Upvotes: 3

Mike Hofer
Mike Hofer

Reputation: 17022

Your XML appears to be malformed. XML is a strict format that requires closing tags.

<item>
 <elementA id="1" value="aaaa">zazaz</elementA>
 <elementB id="2" value="bbbb">wwwww</elementB>
 <elementC id="3" value="cccc">sssss</elementC>
</item>

Close your tags, and the code will likely work.

Per the MSDN documentation for the XElement.Value property, Value returns "A String that contains all of the text content of this element. If there are multiple text nodes, they will be concatenated."

Given this, think you're on the right track, and that the issue is likely your data format.

UPDATE

I modified your XML as shown above, and ran your code through LINQPad 4. The following was returned:

elementA: id=1: value=aaaa extra data= zazaz
elementB: id=2: value=bbbb extra data= wwwww
elementC: id=3: value=cccc extra data= sssss

It does, indeed, appear to be an issue with the data format.

Upvotes: 0

Related Questions