Reputation: 3740
I am trying JavaScript RegEx to extract all text between CSS HTML tags:
var rawHtml = "<style type='text/css'> div { color: red; } </style>";
//var rawHtml = "<style type=\"text/css\"> div { color: red; } </style>";
//var rawHtml = "<style> div { color: red; } </style>";
var cssString = rawHtml.match(/<style[^>]*>(.+?)<\/style>/gi);
console.log(cssString);
The style tag may have attributes as well as single or double quotes. How to successfully extract for all use cases? My Regex is not picking it up.
Upvotes: 0
Views: 5172
Reputation: 447
I think the main problem in your code is that you've set cssString to the full match rather than to the part matched in parentheses. You need something like:
var innerHTML = cssString ? cssString[1] : "";
The important part here is that the parenthetical match from your regex - (.+?)
- is stored in backreference 1, i.e. in cssString[1]
, not in cssString
.
However, I'd also make a small change to make your regex more robust:
/<style[^>]*>([^<]+)<\/style>/i
Here we're matching "anything that is not a <
" in the parenthetical backreference. Since the code inside the style tags could go over more than one line, .*
or .+
is not a great way to match "everything", since in JavaScript, the dot doesn't match line breaks. You can use negated character classes instead. To match absolutely anything, use [\s\S]*
(anything none or as many times as possible) or [\s\S]+
(anything at least once and as many times as possible). However, here you want to make sure the match stops at the next <
. I eliminated the question mark, because you don't need to make the search lazy if the regex can't jump past the next <
.
EDIT: I've just realized you're using the global flag, which changes things a bit. Above answer assumes a single match, without the /g
flag. Will add some info about global matching shortly.
So, to iterate over all <style>
elements in a document that may have several, with your regex, you need to do something like this:
var styleMatchRegExp = /<style[^>]*>([^<]+)<\/style>/ig;
var match = styleMatchRegExp.exec(rawHtml);
var cssStringArray = [];
while (match != null) {
cssStringArray.push(match[1]);
match = styleMatchRegExp.exec(rawHtml);
}
You'll end up with an array (cssStringArray) containing the css in each of the <style>...</style>
groups in your document.
Upvotes: 2
Reputation: 371168
Just use DOMParser instead:
const rawHTML = "<style type='text/css'> div { color: red; } </style>";
const doc = new DOMParser().parseFromString(rawHTML, "text/html");
const matches = [...doc.querySelectorAll('style')]
.map(style => style.textContent);
console.log(matches);
Upvotes: 6