Reputation: 3563

How to check that all html tags are closed with Regex

I mean that each < should have an appropriate >. A string without any < or > should be valid too.

Any idea?

Upvotes: 1

Answers (2)

Rob W

Reputation: 349252

Once, I've created a JavaScript BB-code parser, which also dealt with incorrectly closed tags. The same concept also applies to HTML (and any other markup language which rely on a tree).

Define variables: var string = ""; var lastIndex = 0; var stack = []; var parsedString = ""///And some more
Loop through the string, until a < is matched, using string.indexOf("<", lastIndex).
Select the tag name, and search for the closing > (using an RE: /[^<]+?/). Set lastIndex to the index of this >, plus 1.
Add this value (tagName) to an array (let's define this array: var stack = [];).
If a closing tag is encountered, walk through the stack, from the last element and back.
If the start tag is the last element of stack, use stack.pop(). Continue at 1.
If the start tag isn't the last element of the array:
- If your tag is important, persist to find the opening tag (</div> should close any <div>, even if you have to throw away 9001 <span> declarations).
- While you walk through the array, check the status of the encountered tags: Are these "important" elements? (<strong> is less important than <div>, for example).
- If you encounter an important tag (such as <div>), while your closing tag was a </em>, ignore the closing tag and go back to 1.

When 1 evaluates to false (no < found), add the remaining string to the result resultString += string.substring(lastIndex, string.length);.

After following these steps, you've parsed a string.

Upvotes: 1

Jens

Reputation: 25593

Your string will have a tag thats not properly opened or closed, if there are two consecutive opening or closing brackets with only non-bracket characters between them. These would be matched by

<(?=[^>]*<)|>(?=[^<]*>)

Note that this will work reliably only on html without script parts or comments! Also, this will only check the brackets. It will not check if every tag you opened is closed again. (I.e. it will detect <<a> as wrong, but not <a></b>)

Upvotes: 0

How to check that all html tags are closed with Regex

Answers (2)

Related Questions