Smit
Smit

Reputation: 609

Regex which works only in tag

Using JavaScript I try to match any attributes with value which starts with "on" (it could be onerror, onmouseover, etc.), my example:

/<*?(on[^=-\s]+)=["']?((?:.(?!["']?\s+(?:\S+)=|[>"']))+.)["']?/gmi

(online example: https://www.regex101.com/r/dQ4xH4/1)

But I want to work this regular expressin only in tags (between '<' and '>' characters) So as you can see in current example the regex matches even outside of tags. How can I modify my regex that it maths only in tags (any tags)?

Upvotes: 0

Views: 113

Answers (3)

Vladimir Drenovski
Vladimir Drenovski

Reputation: 24

You can try several different solutions depending on what your actualy need: Lets take this tag as an example :<source onerror="alert(1)">

  1. geting only attribute name and value (matches exclude = and "):

/<{1}\w+[\w\s\'\"\=]*(on[^=-\s]+)=["']([\S\w\d]*|[\S\w\d ]*)["']>{1}/gmi

this will return array looking like this:

array (size=2)
  0 => string 'onerror' 
  1 => string 'alert(1)'

Demo with multiple tests

  1. geting attribute with value (matches include = and "):

/<{1}\w+[\w\s\'\"\=]*((on[^=-\s]+)=["']([\S\w\d]*|[\S\w\d ]*)["'])>{1}/gmi

this will return array looking like this:

array (size=3)
  0 => string 'onerror="alert(1)"' 
  1 => string 'onerror'
  2 => string 'alert(1)'

Demo with multiple tests

  1. geting the entire tag:

/(<{1}\w+[\w\s\'\"\=]*(on[^=-\s]+)=["']([\S\w\d]*|[\S\w\d ]*)["']>{1})/gmi

this will return array looking like this:

array (size=3)
  0 => string '<source onerror="alert(1)">' 
  1 => string 'onerror'
  2 => string 'alert(1)'

Demo with multiple tests

  1. geting all of the above:

/(<{1}\w+[\w\s\'\"\=]*((on[^=-\s]+)=["']([\S\w\d]*|[\S\w\d ]*)["'])>{1})/gmi

this will return array looking like this:

array (size=4)
  0 => string '<source onerror="alert(1)">' 
  1 => string 'onerror="alert(1)"'
  2 => string 'onerror'
  3 => string 'alert(1)'

Demo with multiple tests

EDIT: This is my final edit of this answer. I will not continue to expand it since RegEx is a "not recomended" way to parse HTML code.

Upvotes: -1

Ja͢ck
Ja͢ck

Reputation: 173562

Assuming you have built a DOM of the HTML you're trying to process:

var nodes = root.getElementsByTagName('*');

var result = [].filter.call(nodes, function(el) {
    return [].some.call(el.attributes, function(attr) {
        return attr.name.match(/^on/i);
    });
});

It iterates over all elements that are found under root and inspects whether any of the attribute names starts with on.

Upvotes: 3

neuhaus
neuhaus

Reputation: 4094

Do a non-greedy match for [^>] to make sure you are still inside the HTML element.

<[^>]*?(on[^=-\s]+)=["']?((?:.(?!["']?\s+(?:\S+)=|[>"']))+.)["']?

Upvotes: -1

Related Questions