Reputation: 636
I want to get string hello world
from an html string like this:
Hello world! hello world! Hello world! <a href="#">hello world</a><p>hello world</p><p><a href="#">hello world</a></p>
But I don't want to get hello world
in a
tag. Example:
<a href="#">hello world</a>
and
<p><a href="#">hello world</a></p>
will not match.
My code:
var replacepattern = new RegExp('hello world(?![^<]*>)',"ig");
returns all hello world
s in the string. Any ideas?
EDIT:
I use (?![^<]*>)
in case: <p title="hello world"> hello world</p>
.
So I don't get the hello world
s in tag attributes
EDIT 2:
I want to return the string:
'<a href="#hello world">Hello world</a>! <a href="#hello world">Hello world</a>! <a href="#hello world">Hello world</a>! <a href="#">Hello world</a><p><a href="#hello world">Hello world</a></p><p><a href="#">Hello world</a></p>'
Upvotes: 0
Views: 723
Reputation: 1558
Most browsers support negative lookahead now you can try this:
(?![^>]*<\/[a-zA-Z]>)(Hello world)
Demo: https://regex101.com/r/rDPp0t/2/
Upvotes: 0
Reputation: 15000
This expression will:
hello world
substrings which are outside the anchor tagsRegex
((?:<a(?=\s|>)(?:[^>=|&)]|='(?:[^']|\\')*'|="(?:[^"]|\\")*"|=[^'"][^\s>]*)*>.*?<\/a>|(?!hello\sworld|<a\s).)*)(hello\sworld\s\d+)((?:<a(?=\s|>)(?:[^>=|&)]|='(?:[^']|\\')*'|="(?:[^"]|\\")*"|=[^'"][^\s>]*)*>.*?<\/a>|(?!hello\sworld|<a\s).)*)
Theory:
((?:<a(?=\s|>)(?:[^>=|&)]|='(?:[^']|\\')*'|="(?:[^"]|\\")*"|=[^'"][^\s>]*)*>.*?<\/a>|(?!hello\sworld|<a\s).)*)
Captures the anchor tags, and any text outside the anchor tags which is not hello world
. This is group 1(hello\sworld\s\d+)
Captures the hello world. This is group 2. Since I added digits in my sample text to help show which sub strings were being captured, I also added the \s\d+
to this section. Yes arguably this beyond your original scope. :)((?:<a(?=\s|>)(?:[^>=|&)]|='(?:[^']|\\')*'|="(?:[^"]|\\")*"|=[^'"][^\s>]*)*>.*?<\/a>|(?!hello\sworld|<a\s).)*)
Captures the anchor tags, and any text outside the anchor tags which is not hello world
. This is group 3. It's an identical pattern to group 1, but is required or else you might encounter odd results on the last match in the string.Replace With
In the samples below I used this replacement to help make it more obvious what's happening:
$1_______$3
You could use this to replace your hello world
strings with anchor tags with this:
$1<a href="$2">$2</a>$3
Sample text
Note the difficult edge cases in the anchor tag with the onmouseover attribute. I also added numbers to each of the hello world
s so they are easier for us humans to read.
<a href="#">hello world 00</a>Hello world 1! hello world 2! Hello world 3! <a onmouseover=' a=1; href="www.NotYourURL.com" ; if (3 <a && href="www.NotYourURL.com" && id="revSAR" && 6 > 3) { funRotate(href) ; } ; ' href="#">hello world 04</a><p>hello world 5</p><p><a href="#">hello world 06</a></p> <a href="#">hello world 07</a>fdafdsa
Sample Javascript
<script type="text/javascript">
var re = /((?:<a(?=\s|>)(?:[^>=|&)]|='(?:[^']|\\')*'|="(?:[^"]|\\")*"|=[^'"][^\s>]*)*>.*?<\/a>|(?!hello\sworld|<a\s).)*)(hello\sworld\s\d+)((?:<a(?=\s|>)(?:[^>=|&)]|='(?:[^']|\\')*'|="(?:[^"]|\\")*"|=[^'"][^\s>]*)*>.*?<\/a>|(?!hello\sworld|<a\s).)*)/;
var sourcestring = "source string to match with pattern";
var replacementpattern = "$1<a href="$2">$2</a>$3";
var result = sourcestring.replace(re, replacementpattern);
alert("result = " + result);
</script>
String After Replacement
This is just to show what's happening, using the first "replace with"
<a href="#">hello world 00</a>_______! _______! _______! <a href="#">hello world 04</a><p>_______</p><p><a href="#">hello world 06</a></p> <a href="#">hello world 07</a>fdafdsa
This is using the second "replace with" to show how that it actually works
<a href="#">hello world 00</a><a href="Hello world 1">Hello world 1</a>! <a href="hello world 2">hello world 2</a>! <a href="Hello world 3">Hello world 3</a>! <a onmouseover=' a=1; href="www.NotYourURL.com" ; if (3 <a && href="www.NotYourURL.com" && id="revSAR" && 6 > 3) { funRotate(href) ; } ; ' href="#">hello world 04</a><p><a href="hello world 5">hello world 5</a></p><p><a href="#">hello world 06</a></p> <a href="#">hello world 07</a>fdafdsa
Upvotes: 1
Reputation: 13529
I think that this will work:
var str = 'Hello > world <! Hello > world <! Hello > world <! <a href="#">Hello > world <</a><p>Hello > world <</p><p><a href="#">Hello > world <</a></p>';
var textToReplace = 'Hello > world <'
var re = new RegExp('(?!(^<*(href=)*(>)))' + textToReplace + '(?!(</a>))',"ig");
var result = str.replace(re, '@');
console.log(result);
The result is
@! @! @! <a href="#">Hello > world <</a><p>@</p><p><a href="#">Hello > world <</a></p>
Is that what you want to achieve?
JsFiddle -> http://jsfiddle.net/Che3v/1/
Upvotes: -1
Reputation: 276306
Let's say you got that HTML in a string:
var str = 'Hello world! hello world! Hello world! <a href="#">hello world</a><p>hello world</p><p><a href="#">hello world</a></p>';
Instead of coming up with complicated REGEX patterns to match it, we'll put that HTML in an HTML container and use the powerful DOM api built into every browser with JavaScript to process it.
var el = document.createElement("div");
el.innerHTML = str;
Now, let's get all a
tags from our element, and remove them ourselves
var aTags = el.getElementsByTagName("a");
while(aTags.length > 0){ // while the element still has a tags
aTags[0].parentNode.removeChild(aTags[0]); //remove
}
Now, we can get the HTML back and get the correct text content
el.innerHTML;
This now is:
"Hello world! hello world! Hello world! <p>hello world</p><p></p>"
Now, if we just want the text without the tags, we can do that too.
el.textContent;
Will evaluate to:
"Hello world! hello world! Hello world! hello world"
Upvotes: 1