Reputation: 3004
I'm very very new to regex. I'd managed to not touch it with a 10-foot pole for so long. And I tried my best to avoid it so far. But now a personal project is pushing me to learn it.
So I started. And I'm going through the tutorial located here:http://www.regular-expressions.info/tutorial.html
Currently I'm here: http://www.regular-expressions.info/repeat.html
My question is:
The tutorial says <[A-Za-z][A-Za-z0-9]*>
will match an HTML tag.
But wouldn't it also match invalid html tags like - <h11>
or <h111>
?
Also how would it match the closing tags?
Edit - My question is very specific. I am referring to one particular example in one particular tutorial to clarify whether or not my understanding of repetitions is correct. Again, I REPEAT, I DO NOT care about html parsing with regex.
Upvotes: 4
Views: 6971
Reputation: 93696
The tutorial says <[A-Za-z][A-Za-z0-9]*> will match an HTML tag.
But wouldn't it also match invalid html tags like - or ? Also how would it match the closing tags?
Yes, that will match <h11>
as well as <X098wdfhfdshs98fhj2hsdljhkvjnvo9sudvsodfih23234osdfs>
.
If you want to just match a letter followed by an optional single digit, so you'd match <h1>
, then you want <[A-Za-z][0-9]?>
Upvotes: 0
Reputation: 7835
I don't see any harm in answering your question seeing as how you are attempting to learn regex:
1) Yes, it will match invalid tags as well because it's any letter followed by any zero or more matches of another letter or a number.
2) It will not match closing tags (there would have to be a search for a /
somewhere in there).
One more comment: one way people used to use to look for html tags inside a document was to look for the pattern of opening and closing brackets, like so:
<\/?[^>]*>
That's opening-bracket, an optional slash, (anything but a closing bracket)-repeated and then a closing bracket. Of course, I am not recommending anyone do this. It's merely left here as an exercise.
Upvotes: 7