RegEx for matching HTML tags

Question

I am trying to use regular expression to extract start tags in lines of a given HTML code. In the following lines I expect to get only 'body' and 'h1'as start tags in the first line and 'html','head' and 'title' as start tags in the second line:

I have already tried to do this using the following regular expression:

start_tags = re.findall(r'<(\w+)\s*.*?[^\/]>',line)

'Website

'
'HTML Parser - II'

But my output for the first line is: ['body','h1','br'], while I do not expect to catch 'br' as I excluded '/'.

And for the second line is ['html','title'], whereas I expect to catch 'head' too. It would be a grate kind if you let me know which part of my code is wrong?

RegEx for matching HTML tags

Answers (1)

RegEx 1 for h1-h6 tags

RegEx Circuit

RegEx 2 for head and body

Performance

Related Questions