user1675891
user1675891

Reputation:

How to remove empty tags using regex?

After my clean-up I ended up with a bunch of empty tags. I'd like to remove them but the expression I've been using this far:

Regex.Replace(clean, "(<[/a-zA-Z]+?)([^>]*?)(>)", "$1$3");

I've seen a discussion here but it didn't get me clear. How do I make sure that the first and the second discovered content of a tag is the same (to match them together) except for the slash?

Upvotes: 2

Views: 4829

Answers (4)

Arthurvcs
Arthurvcs

Reputation: 1

I find a way to remove all empty tags (having a class or not)

The regex solution that i found is:

<\s*[^>/]*>((&nbsp;)*|\s*)</\s*[^></]*>

Look at the following example:

<span class="test1"></span> <span class= "test2">That´s a text</span>

That Regex will just delete the Test2 class.

I hope that´s help you! :)

Upvotes: 0

L.B
L.B

Reputation: 116188

This will be a late answer, but as I said in your previous question:

Don't try to parse xml/html with regex, use a real xml parser to process xmls

Altought, it can work for some simple cases, it would bring more trouble while maintenance and handling corner cases.

Using Linq To XML:

var xml = @"<root>
            <notempty>text</notempty>
            <empty1><empty2><empty3/></empty2></empty1>
            </root>";

var xDoc = XDocument.Parse(xml);
RemoveEmptyNodes(xDoc.Root);
xDoc.Save(fileName2);

void RemoveEmptyNodes(XElement xRoot)
{
    foreach (var xElem in xRoot.Descendants().ToList())
    {
        RemoveEmptyNodes(xElem);
        if (String.IsNullOrWhiteSpace((string)xElem) && xElem.Parent!=null) 
            xElem.Remove();
    }

}

Output would be (handlling the case mentioned by @kirmir)

<root>
    <notempty>text</notempty>
</root>

Upvotes: 2

Konrad Viltersten
Konrad Viltersten

Reputation: 39298

I don't think you need to check if they're of the same kind. That's assuming that you have a valid XML structure. If so, there's can't be anything on form:

<someTagStarts></anOtherTagEnds>

So you can use the following regex.

Regex.Replace(input, "<[^>/][^>]*></[^>]*>", "");

I also found this link but I'm not sure why they're using a plus instead of star at the closing tag. Better to ask about it.

Realizing that you might have the need to remove even the tags that are seemingly empty (they containing empty space and stuff like that), I can bounce back of Sina's solution and add the following).

Regex.Replace(input, @"<([^>/][^>]*)>((&nbsp;)*|\s*)</\1>", String.Empty);

It's somewhere around here that we go from cute to nasty experience of regex. :)

Upvotes: 2

Sina Iravanian
Sina Iravanian

Reputation: 16296

You can use backreference to make sure the name of the closing element matches that of the opening tag. This is the pattern I've got by extending Konrad's solution:

result = Regex.Replace(input, @"<([^>/][^>]*)></\1>", String.Empty);

Here \1 refers to the first group matched in the pattern, which is indicated by the parentheses in the pattern, which surrounds the name of the opening element.

Upvotes: 3

Related Questions