Zettel
Zettel

Reputation: 3

Remove Leading & Trailing Whitespaces from XML Node value

I am looking for a way to remove leading and trailing whitespaces from a XML node's value. Given the following basic example:

<CAR>
  <MAKE>   Ford   </MAKE>
  <COLOR>   Yellow  </COLOR>
  <!--<YEAR>  1987   </YEAR>-->
</CAR>

I need to get the following output:

<CAR>
<MAKE>Ford</MAKE>
  <COLOR>Yellow</COLOR>
  <!--<YEAR>  1987   </YEAR>-->
</CAR>

I managed to get all of this done by successively applying the following two regex:

>\s*[^a-zA-Z0-9^<]*

[^a-zA-Z0-9^>]*\s*</

As my knowledge regarding regex is very limited, this was all I could come up with. The problem is that I ended up with a broken XML document whenever the file contained comments.

So, can anyone help me with getting an expression that successfully removes leading and trailing whitespaces from the values while leaving any comments intact?

I hope, I made myself clear. Thank you in advance!

Upvotes: 0

Views: 2043

Answers (3)

Akash
Akash

Reputation: 99

Try this

Regex.Replace(input string, @"(([^\s]+)\s+)", "$2");

Upvotes: 0

Enigmativity
Enigmativity

Reputation: 117057

If you don't mind not using Regex then this works:

    var doc = XDocument.Parse(@"<CAR>
  <MAKE>   Ford   </MAKE>
  <COLOR>   Yellow  </COLOR>
  <!--<YEAR>  1987   </YEAR>-->
</CAR>");

foreach (var xe in doc.DescendantNodes()
    .Where(n => n.NodeType == XmlNodeType.Text)
    .Select(x => x.Parent)
    .ToArray())
{
    xe.Value = xe.Value.Trim();
}

It updates the doc and gives me this:

<CAR>
  <MAKE>Ford</MAKE>
  <COLOR>Yellow</COLOR>
  <!--<YEAR>  1987   </YEAR>-->
</CAR>

Upvotes: 0

Veverke
Veverke

Reputation: 11358

I see no need for regexes here, moreover, you will need in any case to loop over your xml nodes, so why not simply looping over your node values and doing .Trim() on them ?

For example:

    var xml = XDocument.Load("D:/myXml.xml");

    foreach (var node in xml.Root.Elements())
    {
        foreach (var child in node.Elements())
        {
            Console.WriteLine(string.Format("[{0}]", child.Value.Trim()));
        }
    }

My sample xml file:

enter image description here

The output: (I surrounded it with [ ] so you can see the whitespaces are gone)

enter image description here

Upvotes: 0

Related Questions