user2160425
user2160425

Reputation: 41

Remove consecutive <br> from string using regex c#

I have following string regex

"choose to still go on the trip. <br><br>\r\nNote that when booking"

After converting it with regex I need to replace <br> tags with only one <br> so string would be like this

"choose to still go on the trip. <br>Note that when booking"

Upvotes: 3

Views: 2498

Answers (5)

L-Four
L-Four

Reputation: 13531

This can be done in another (safer) way, using HTML Agility Pack (open source project http://html-agility-pack.net).

It takes into account the various notations <br>, <br/>, <br /> without you having to worry about it. This means you can focus on the actual task: replacing duplicates.

See Remove chain of duplicate elements with HTML Agility Pack, it explains an approach on how to replace duplicates.

Upvotes: 5

Ant P
Ant P

Reputation: 25221

If you need to account for the case where there is whitespace between the tags, try the following regex:

myInputStr = Regex.Replace(myInputStr,
    @"([\b\s]*<[\b\s]*[bB][rR][\s]*/?[\b\s]*>){2,}",
    "<br>", RegexOptions.Multiline);

This regex will replace 2 or more instances of <br> tags with a single instance, regardless of the formation of the tag (spacing, casing, self-closing etc.).

Upvotes: 4

Hossein Narimani Rad
Hossein Narimani Rad

Reputation: 32481

EDIT: If you don't know how many <br> you have, you can do this:

  1. Split your string with <br> and remove empty entries.
  2. Join the string with single <br>

Here is the code:

string yourString = "choose to still go on the trip. <br><br>\r\nNote that when booking";

var temp = 
    yourString.Split(new string[] { "<br>" }, StringSplitOptions.RemoveEmptyEntries)
               .Where(i => i.Replace(" ", string.Empty).Length > 0);

string result = string.Join("<br>", temp);

Upvotes: 2

VVS
VVS

Reputation: 19604

Regex.Replace(input, @"(<br\s*/{0,1}>\s*(</\s*br>)*){2,}", "<br>", 
    RegexOptions.CultureInvariant | 
    RegexOptions.IgnoreCase |
    RegexOptions.Multiline);

Replaces any two or more occurences of <br> or <br/> or <br></br> with a single <br>.

This takes whitespaces into account. <br > would match aswell as <br /> or <br > </ br>.

If you remove the unwanted "\r\n" beforehand you can omit RegexOptions.Multiline.

Upvotes: 0

Postback
Postback

Reputation: 639

like Martin Eden susposed:

while (text.Contains("<br><br>")) 
{ 
    text = text.Replace("<br><br>", "<br>"); 
}    

or

string newString = oldString.Replace("<br><br><br>", "<br>");
newString = newString.Replace("<br><br>", "<br>");

do multiple such lines with increasing <br>

Upvotes: 0

Related Questions