Vad
Vad

Reputation: 3740

JavaScript Regex to Extract Text from Style HTML Tags

I am trying JavaScript RegEx to extract all text between CSS HTML tags:

 var rawHtml = "<style type='text/css'> div { color: red; } </style>";
 //var rawHtml = "<style type=\"text/css\"> div { color: red; } </style>";
 //var rawHtml = "<style> div { color: red; } </style>";
 var cssString = rawHtml.match(/<style[^>]*>(.+?)<\/style>/gi);
 console.log(cssString);

The style tag may have attributes as well as single or double quotes. How to successfully extract for all use cases? My Regex is not picking it up.

Upvotes: 0

Views: 5172

Answers (2)

Jaifroid
Jaifroid

Reputation: 447

I think the main problem in your code is that you've set cssString to the full match rather than to the part matched in parentheses. You need something like:

var innerHTML = cssString ? cssString[1] : ""; 

The important part here is that the parenthetical match from your regex - (.+?) - is stored in backreference 1, i.e. in cssString[1], not in cssString.

However, I'd also make a small change to make your regex more robust:

/<style[^>]*>([^<]+)<\/style>/i

Here we're matching "anything that is not a <" in the parenthetical backreference. Since the code inside the style tags could go over more than one line, .* or .+ is not a great way to match "everything", since in JavaScript, the dot doesn't match line breaks. You can use negated character classes instead. To match absolutely anything, use [\s\S]* (anything none or as many times as possible) or [\s\S]+ (anything at least once and as many times as possible). However, here you want to make sure the match stops at the next <. I eliminated the question mark, because you don't need to make the search lazy if the regex can't jump past the next <.

EDIT: I've just realized you're using the global flag, which changes things a bit. Above answer assumes a single match, without the /g flag. Will add some info about global matching shortly.

So, to iterate over all <style> elements in a document that may have several, with your regex, you need to do something like this:

var styleMatchRegExp = /<style[^>]*>([^<]+)<\/style>/ig;
var match = styleMatchRegExp.exec(rawHtml);
var cssStringArray = [];
while (match != null) {
    cssStringArray.push(match[1]);
    match = styleMatchRegExp.exec(rawHtml);
}

You'll end up with an array (cssStringArray) containing the css in each of the <style>...</style> groups in your document.

Upvotes: 2

CertainPerformance
CertainPerformance

Reputation: 371168

Just use DOMParser instead:

const rawHTML = "<style type='text/css'> div { color: red; } </style>";
const doc = new DOMParser().parseFromString(rawHTML, "text/html");
const matches = [...doc.querySelectorAll('style')]
  .map(style => style.textContent);
console.log(matches);

Upvotes: 6

Related Questions