Reputation:
After my clean-up I ended up with a bunch of empty tags. I'd like to remove them but the expression I've been using this far:
Regex.Replace(clean, "(<[/a-zA-Z]+?)([^>]*?)(>)", "$1$3");
I've seen a discussion here but it didn't get me clear. How do I make sure that the first and the second discovered content of a tag is the same (to match them together) except for the slash?
Upvotes: 2
Views: 4829
Reputation: 1
I find a way to remove all empty tags (having a class or not)
The regex solution that i found is:
<\s*[^>/]*>(( )*|\s*)</\s*[^></]*>
Look at the following example:
<span class="test1"></span>
<span class= "test2">That´s a text</span>
That Regex will just delete the Test2 class.
I hope that´s help you! :)
Upvotes: 0
Reputation: 116188
This will be a late answer, but as I said in your previous question:
Don't try to parse xml/html with regex, use a real xml parser to process xmls
Altought, it can work for some simple cases, it would bring more trouble while maintenance and handling corner cases.
Using Linq To XML:
var xml = @"<root>
<notempty>text</notempty>
<empty1><empty2><empty3/></empty2></empty1>
</root>";
var xDoc = XDocument.Parse(xml);
RemoveEmptyNodes(xDoc.Root);
xDoc.Save(fileName2);
void RemoveEmptyNodes(XElement xRoot)
{
foreach (var xElem in xRoot.Descendants().ToList())
{
RemoveEmptyNodes(xElem);
if (String.IsNullOrWhiteSpace((string)xElem) && xElem.Parent!=null)
xElem.Remove();
}
}
Output would be (handlling the case mentioned by @kirmir)
<root>
<notempty>text</notempty>
</root>
Upvotes: 2
Reputation: 39298
I don't think you need to check if they're of the same kind. That's assuming that you have a valid XML structure. If so, there's can't be anything on form:
<someTagStarts></anOtherTagEnds>
So you can use the following regex.
Regex.Replace(input, "<[^>/][^>]*></[^>]*>", "");
I also found this link but I'm not sure why they're using a plus instead of star at the closing tag. Better to ask about it.
Realizing that you might have the need to remove even the tags that are seemingly empty (they containing empty space and stuff like that), I can bounce back of Sina's solution and add the following).
Regex.Replace(input, @"<([^>/][^>]*)>(( )*|\s*)</\1>", String.Empty);
It's somewhere around here that we go from cute to nasty experience of regex. :)
Upvotes: 2
Reputation: 16296
You can use backreference to make sure the name of the closing element matches that of the opening tag. This is the pattern I've got by extending Konrad's solution:
result = Regex.Replace(input, @"<([^>/][^>]*)></\1>", String.Empty);
Here \1
refers to the first group matched in the pattern, which is indicated by the parentheses in the pattern, which surrounds the name of the opening element.
Upvotes: 3