Reputation: 3
I am looking for a way to remove leading and trailing whitespaces from a XML node's value. Given the following basic example:
<CAR>
<MAKE> Ford </MAKE>
<COLOR> Yellow </COLOR>
<!--<YEAR> 1987 </YEAR>-->
</CAR>
I need to get the following output:
<CAR>
<MAKE>Ford</MAKE>
<COLOR>Yellow</COLOR>
<!--<YEAR> 1987 </YEAR>-->
</CAR>
I managed to get all of this done by successively applying the following two regex:
>\s*[^a-zA-Z0-9^<]*
[^a-zA-Z0-9^>]*\s*</
As my knowledge regarding regex is very limited, this was all I could come up with. The problem is that I ended up with a broken XML document whenever the file contained comments.
So, can anyone help me with getting an expression that successfully removes leading and trailing whitespaces from the values while leaving any comments intact?
I hope, I made myself clear. Thank you in advance!
Upvotes: 0
Views: 2043
Reputation: 117057
If you don't mind not using Regex then this works:
var doc = XDocument.Parse(@"<CAR>
<MAKE> Ford </MAKE>
<COLOR> Yellow </COLOR>
<!--<YEAR> 1987 </YEAR>-->
</CAR>");
foreach (var xe in doc.DescendantNodes()
.Where(n => n.NodeType == XmlNodeType.Text)
.Select(x => x.Parent)
.ToArray())
{
xe.Value = xe.Value.Trim();
}
It updates the doc
and gives me this:
<CAR>
<MAKE>Ford</MAKE>
<COLOR>Yellow</COLOR>
<!--<YEAR> 1987 </YEAR>-->
</CAR>
Upvotes: 0
Reputation: 11358
I see no need for regexes here, moreover, you will need in any case to loop over your xml nodes, so why not simply looping over your node values and doing .Trim()
on them ?
For example:
var xml = XDocument.Load("D:/myXml.xml");
foreach (var node in xml.Root.Elements())
{
foreach (var child in node.Elements())
{
Console.WriteLine(string.Format("[{0}]", child.Value.Trim()));
}
}
My sample xml file:
The output: (I surrounded it with [ ] so you can see the whitespaces are gone)
Upvotes: 0