Reputation: 19
I have this text:
<body>
<span class="Forum"><div align="center"></div></span><br />
<span class="Topic">Text</span><br />
<hr />
<b>Text</b> Text<br />
<hr width=95% class="sep"/>
Text<a href="Text" target="_blank">Text</a>
<hr />
<b>Text</b> -Text<br />
<hr width=95% class="sep"/>
**Text what i need.**
<hr />
and my RegEx for "Text what I need" - /"sep"(.*)hr/m
.
It's wrong: Why?
Upvotes: 1
Views: 80
Reputation: 344783
.
doesn't match newlines in JavaScript regular expressions. Try:
/"sep"([\s\S]*)hr/m
IMO, you're much better off going for a different approach, regex isn't ideal for extracting data from HTML. A better method would be to create a div, set the element's innerHTML property to the HTML string you have, then use DOM traversal to find the text node you need.
Here's an example of what I mean: http://www.jsfiddle.net/W33n6/. It uses the following code to get the text:
var div = document.createElement("div");
div.innerHTML = html;
var hrs = div.getElementsByTagName("hr");
for (var i = 0; i < hrs.length; i++) {
if (hrs[i].className == "sep") {
document.body.innerHTML = hrs[i].nextSibling.nodeValue;
break;
}
}
EDIT: Gumbo's version is a little stricter than mine, checking for the "sep" class among other classes and ensuring the node following is a text node.
Upvotes: 1
Reputation: 655785
Don’t use regular expression, use DOM methods instead:
var elems = document.getElementByTagName("hr");
for (var i=0; i<elems.length; ++i) {
var elem = elems[i];
if (/(?:^|\s+)sep(?:\s|$)/.test(elem.className) &&
elem.nextSibling && elem.nextSibling.nodeType === Node.TEXT_NODE) {
var text = elems.nextSibling.nodeValue;
break;
}
}
This selects all HR
elements, checks if it has the class sep and grabs the next sibling node if it is a text node.
Upvotes: 2