Nandakumar V
Nandakumar V

Reputation: 4625

Javascript Regular expression to remove unwanted <br>,  

I have a JS stirng like this
&lt;div id="grouplogo_nav"&gt;<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;ul&gt;<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;li&gt;&lt;a class="group_hlfppt" target="_blank" href="http://www.hlfppt.org/"&gt;&amp;nbsp;&lt;/a&gt;&lt;/li&gt;<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;/ul&gt;<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;/div&gt;

I need to remove all <br> and $nbsp; that are only between &gt; and &lt;. I tried to write a regular expression, but didn't got it right. Does anybody have a solution.

EDIT :

Please note i want to remove only the tags b/w &gt; and &lt;

Upvotes: 5

Views: 13169

Answers (6)

Karan Kapadni
Karan Kapadni

Reputation: 45

This worked for me, please note for the multi lines

myString = myString.replace(/(&nbsp;|<br>|<br \/>)/gm, '');

Upvotes: 0

eminor
eminor

Reputation: 933

s.replace(/(&gt;)(?:&nbsp;|<br>)+(\s?&lt;)/g,'$1$2');

Don't use this in production. See the answer from Phil H.

Edit: I try to explain it a bit and hope my english is good enough.

Basically we have two different kinds of parentheses here. The first pair and third pair () are normal parentheses. They are used to remember the characters that are matched by the enclosed pattern and group the characters together. For the second pair, we don't need to remember the characters for later use, so we disable the "remember" functionality by using the form (?:) and only group the characters to make the + work as expected. The + quantifier means "one or more occurrences", so &nbsp; or <br> must be there one or more times. The last part (\s?&lt;) matches a whitespace character (\s), which can be missing or occur one time (?), followed by the characters &lt;. $1 and $2 are kind of variables that are replaces by the remembered characters of the first and third parentheses.

MDN provides a nice table, which explains all the special characters.

Upvotes: 1

Phil H
Phil H

Reputation: 20151

Avoid using regex on html!

Try creating a temporary div from the string, and using the DOM to remove any br tags from it. This is much more robust than parsing html with regex, which can be harmful to your health:

var tempDiv = document.createElement('div');
tempDiv.innerHTML = mystringwithBRin;
var nodes = tempDiv.childNodes;
for(var nodeId=nodes.length-1; nodeId >= 0; --nodeId) {
    if(nodes[nodeId].tagName === 'br') {
        tempDiv.removeChild(nodes[nodeId]);
    }
}
var newStr = tempDiv.innerHTML;

Note that we iterate in reverse over the child nodes so that the node IDs remain valid after removing a given child node.

http://jsfiddle.net/fxfrt/

Upvotes: 4

Konstantin Dinev
Konstantin Dinev

Reputation: 34905

You need to replace globally. Also don't forget that you can have the
being closed
. Try this:

myString = myString.replace(/(&nbsp;|<br>|<br \/>)/g, '');

Upvotes: 0

Varun Bansal
Varun Bansal

Reputation: 382

myString = myString.replace(/^(&nbsp;|<br>)+/, '');

hope this helps

Upvotes: -1

dsgriffin
dsgriffin

Reputation: 68616

myString = myString.replace(/^(&nbsp;|<br>)+/, '');

... where /.../ denotes a regular expression, ^ denotes start of string, ($nbsp;|<br>) denotes "&nbsp; or <br>", and + denotes "one or more occurrence of the previous expression". And then simply replace that full match with an empty string.

Upvotes: 2

Related Questions