ffffff01
ffffff01

Reputation: 5238

Add character to empty html tags with c# and regex

I want to find all empty HTML tags in a string, eg:

<div></div>
<span>test</span>
<a></a>

and add a space or a character to all of the empty tags in that string:

<div>something</div>
<span>test</span>
<a>something</a>

I've got an regex that matches all empty tags, but I'm not sure what's the best way replace the tags.

Regex:

<(\w+)(?:\s+\w+="[^"]+(?:"\$[^"]+"[^"]+)?")*>\s*</\1>

Upvotes: 1

Views: 300

Answers (3)

Ro Yo Mi
Ro Yo Mi

Reputation: 15010

Description

Handling this via regex is probably not the best way to go, however because there may be reasons for using a regular expression such as "I'm not allowed to install HTMLAgilityPack" then this expression will:

  • find all tags which are simply open tag followed by a close tag
  • will avoid many of the edge cases that make pattern matching in HTML with regex difficult

Regex: (<(\w+)(?=\s|>)(?:[^'">=]*|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*>)(<\/\2>)

Replace with: $1~~~NewValue~~~$3

enter image description here

Example

Live Demo

Sample Text

Note the first line has some really difficult edge cases

<a onmouseover=' str=" <a></a> " ; if ( 6 > 4 ) { funDoSomething(str); } '></a>
<div></div>
<span>test</span>
<a></a>

Text After Replacement

<a onmouseover=' str=" <a></a> " ; if ( 6 > 4 ) { funDoSomething(str); } '>~~~NewValue~~~</a>
<div>~~~NewValue~~~</div>
<span>test</span>
<a>~~~NewValue~~~</a>

Upvotes: 1

Anirudha
Anirudha

Reputation: 32807

Use HtmlAgilityPack

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
foreach(HtmlNode node in doc.DocumentElement.SelectNodes("//*").Where(x=>x.InnerText==""))
{
       node.ParentNode.ReplaceChild(HtmlTextNode.CreateNode(input), node);
}
doc.Save(yourFile);

Upvotes: 3

Raidri
Raidri

Reputation: 17550

Use Html Agility Pack for Html Parsing never regex.

Upvotes: 0

Related Questions