Reputation: 3563
I mean that each <
should have an appropriate >
. A string without any <
or >
should be valid too.
Any idea?
Upvotes: 1
Views: 3728
Reputation: 348992
Once, I've created a JavaScript BB-code parser, which also dealt with incorrectly closed tags. The same concept also applies to HTML (and any other markup language which rely on a tree).
var string = ""; var lastIndex = 0; var stack = []; var parsedString = ""///And some more
<
is matched, using string.indexOf("<", lastIndex)
.>
(using an RE: /[^<]+?/
). Set lastIndex
to the index of this >
, plus 1.var stack = [];
).stack
, use stack.pop()
. Continue at 1.</div>
should close any <div>
, even if you have to throw away 9001 <span>
declarations).<strong>
is less important than <div>
, for example).<div>
), while your closing tag was a </em>
, ignore the closing tag and go back to 1.When 1 evaluates to false
(no <
found), add the remaining string to the result resultString += string.substring(lastIndex, string.length);
.
After following these steps, you've parsed a string.
Upvotes: 1
Reputation: 25563
Your string will have a tag thats not properly opened or closed, if there are two consecutive opening or closing brackets with only non-bracket characters between them. These would be matched by
<(?=[^>]*<)|>(?=[^<]*>)
Note that this will work reliably only on html without script parts or comments! Also, this will only check the brackets. It will not check if every tag you opened is closed again. (I.e. it will detect <<a>
as wrong, but not <a></b>
)
Upvotes: 0