Reputation: 24659
I have the following code:
var html = "<div class='test'><b>Hello</b> <i>world!</i></div>";
var results = html.match(/<(\/?) (\w+) ([^>]*?)>/);
About the three sets of parenthesis:
First mean: forward slash or nothing.
Second mean: one or more alphanumeric characters.
Third mean: anything but '>' then I don't understand the '*?' !
Also how do I interpret the fact that there are three sets of parenthesis separated by white spaces?
Regards,
Upvotes: 0
Views: 297
Reputation: 168655
*?
in regex is a "lazy star".
A star means "repeat the previous item zero or more times". The previous item in this case is a character class that defines "any character except >
".
By default a star on its own is "greedy", which means that it will match as many characters as possible while still meeting the criteria for the rest of the expression around it.
Changing it to a lazy star by adding the question mark means that it will instead match as few characters as possible while still meeting the rest of the criteria.
In the case of your expression, this will in fact make no difference at all to the actual results, because you the character to match immediately after the star is a >
, which is the exact opposite of the previous match. This means that the expression will always match the same result for the [^>]*
regardless of whether it is lazy or greedy.
In other regular expressions, the difference is more important because greedy expressions can swallow parts of the string that would have otherwise matched later in the expression.
However, although there may be no difference to the result, there may still be a difference between greedy and lazy expressions, because the different ways in which they are processed can result in the expressions running at different speeds. Again, I don't think it will make much different in your case, but in some cases it can make a big impact.
I recommend reading up on regex at http://www.regular-expressions.info/ -- it's got an excellent reference table for all the regex syntax you're likely to need, and articles on many of the difficult topics.
Upvotes: 0
Reputation: 348972
*
means "match as much as possible" (possibly zero characters) of the previously defined literal,
?
means: match just enough so that the RegExp returns a match.
Example:
String:
Tester>
[^>]*
Tester
[^>]*?
<empty string>
[^>]*e
Teste
[^>]*?e
Te (Including T is required to produce a valid match)
In your case:
String:
<input value=">"> junk
[^>]*>
<input value=">">
[^>]*?>
<input value=">
Upvotes: 2
Reputation: 11149
An asterisk (*
) means match the preceding bit zero or more times. The preceding bit is [^>]
, meaning anything but a >. As @user278064 says, the ?
is redundant. It's meant to make the *
non-greedy, but there's no need as the [^>]
already specifies what the *
should refer to. (You could replace [^>]
with a .
(full-stop/period) which would match any character, then the ?
would make sure it matches anything until >
.)
As for the spaces, they shouldn't be there... they literally match spaces, which I don't think you want.
Upvotes: 1