guest
guest

Reputation: 19

RegEx JavaScript problem

I have this text:

<body> 
<span class="Forum"><div align="center"></div></span><br /> 
<span class="Topic">Text</span><br /> 

   <hr /> 
  <b>Text</b> Text<br /> 
  <hr width=95% class="sep"/> 
  Text<a href="Text" target="_blank">Text</a> 
   <hr /> 
  <b>Text</b> -Text<br /> 
  <hr width=95% class="sep"/> 
 **Text what i need.**
   <hr /> 

and my RegEx for "Text what I need" - /"sep"(.*)hr/m .

It's wrong: Why?

Upvotes: 1

Views: 80

Answers (2)

Andy E
Andy E

Reputation: 344783

. doesn't match newlines in JavaScript regular expressions. Try:

/"sep"([\s\S]*)hr/m

IMO, you're much better off going for a different approach, regex isn't ideal for extracting data from HTML. A better method would be to create a div, set the element's innerHTML property to the HTML string you have, then use DOM traversal to find the text node you need.

Here's an example of what I mean: http://www.jsfiddle.net/W33n6/. It uses the following code to get the text:

var div = document.createElement("div");
div.innerHTML = html;
var hrs = div.getElementsByTagName("hr");

for (var i = 0; i < hrs.length; i++) {
    if (hrs[i].className == "sep") {
        document.body.innerHTML = hrs[i].nextSibling.nodeValue;
        break;
    }
}​

EDIT: Gumbo's version is a little stricter than mine, checking for the "sep" class among other classes and ensuring the node following is a text node.

Upvotes: 1

Gumbo
Gumbo

Reputation: 655785

Don’t use regular expression, use DOM methods instead:

var elems = document.getElementByTagName("hr");
for (var i=0; i<elems.length; ++i) {
    var elem = elems[i];
    if (/(?:^|\s+)sep(?:\s|$)/.test(elem.className) &&
        elem.nextSibling && elem.nextSibling.nodeType === Node.TEXT_NODE) {
        var text = elems.nextSibling.nodeValue;
        break;
    }
}

This selects all HR elements, checks if it has the class sep and grabs the next sibling node if it is a text node.

Upvotes: 2

Related Questions