DS.
DS.

Reputation: 3004

Regex to identify HTML tags (as a regex repetition learning exercise ONLY!!)

I'm very very new to regex. I'd managed to not touch it with a 10-foot pole for so long. And I tried my best to avoid it so far. But now a personal project is pushing me to learn it.

So I started. And I'm going through the tutorial located here:http://www.regular-expressions.info/tutorial.html

Currently I'm here: http://www.regular-expressions.info/repeat.html

My question is:

The tutorial says <[A-Za-z][A-Za-z0-9]*> will match an HTML tag.

But wouldn't it also match invalid html tags like - <h11> or <h111>? Also how would it match the closing tags?

Edit - My question is very specific. I am referring to one particular example in one particular tutorial to clarify whether or not my understanding of repetitions is correct. Again, I REPEAT, I DO NOT care about html parsing with regex.

Upvotes: 4

Views: 6971

Answers (2)

Andy Lester
Andy Lester

Reputation: 93696

The tutorial says <[A-Za-z][A-Za-z0-9]*> will match an HTML tag.

But wouldn't it also match invalid html tags like - or ? Also how would it match the closing tags?

Yes, that will match <h11> as well as <X098wdfhfdshs98fhj2hsdljhkvjnvo9sudvsodfih23234osdfs>.

If you want to just match a letter followed by an optional single digit, so you'd match <h1>, then you want <[A-Za-z][0-9]?>

Upvotes: 0

erewok
erewok

Reputation: 7835

I don't see any harm in answering your question seeing as how you are attempting to learn regex:

1) Yes, it will match invalid tags as well because it's any letter followed by any zero or more matches of another letter or a number.

2) It will not match closing tags (there would have to be a search for a / somewhere in there).

One more comment: one way people used to use to look for html tags inside a document was to look for the pattern of opening and closing brackets, like so:

<\/?[^>]*>

That's opening-bracket, an optional slash, (anything but a closing bracket)-repeated and then a closing bracket. Of course, I am not recommending anyone do this. It's merely left here as an exercise.

Upvotes: 7

Related Questions