Javascript regex not working as intended

Question

I have the HTML from a page in a variable as just plain text. Now I need to remove some parts of the text. This is a part of the HTML that I need to change:


    
        
             RuneRifle 
            op 24.08.2012 om 21:41 uur
        
        
            Citaat Bewerken
        
        
    
    Testforum

These replaces work:

pageData = pageData.replace(/href=\".*?\"/g, "href=\"#\"");
pageData = pageData.replace(/target=\".*?\"/g, "");

But this replace does not work at all:

pageData = pageData.replace(
  /(.*?)<\/span>/g, "");

I need to remove every span with the class postheader_right and everything in it, but it just doesn't work. My knowledge of regex isn't that great so I'd appreciate if you would tell me how you came to your answer and a small explanation of how it works.

Mike Samuel · Accepted Answer

I need to remove every span with the class postheader_right and everything in it, but it just doesn't work.

Don't use regular expressions to find the spans. Using regular expressions to parse HTML: why not?

var allSpans = document.getElementsByClassName('span');
for (var i = allSpans.length; --i >= 0;) {
  var span = allSpans[i];
  if (/\bpostheader_right\b/.test(span.className)) {
    span.parentNode.removeChild(span);
  }
}

should do it.

If you only need to work on newer browsers then getElementsByClassName makes it even easier:

Find all div elements that have a class of 'test'

var tests = Array.filter( document.getElementsByClassName('test'), function(elem){
  return elem.nodeName == 'DIV';
});

Javascript regex not working as intended

Answers (2)

Related Questions