Reputation:
I thought to do a preg_count for each "/<[a-z0-9]+>/i"
and then count if exists the same number with the closed tags ie: "/</[a-z0-9]+>/i"
But I am not too sure. How would you count all opened tags and check if exists all closed tags?
Ps. i don't need to check for attribute and for xml />
single close tag. I just need a count on plain simple html tag
Thanks
Upvotes: 0
Views: 3002
Reputation: 459
My case
function checkHtml($html) {
$level = 0;
$map = [];
$length = strlen($html);
$open = false;
$tag = '';
for($i = 0; $i < $length; $i ++) {
$c = substr($html, $i, 1);
if($c == '<') {
$open = true;
$tag = '';
} else if($open && ($c == '>' || ord($c) == 32)) {
$open = false;
if(in_array($tag, ['br', 'br/', 'hr/', 'img/', 'hr', 'img'])) {
continue;
}
if(strpos($tag, '/') === 0) {
if(!isset($map[$tag.($level-1)])) {
return false;
}
$level --;
unset($map[$tag.$level]);
} else {
$map['/'.$tag.$level] = true;
$level ++;
}
} else if($open) {
$tag .= $c;
}
}
return $level == 0;
}
Upvotes: 0
Reputation: 2470
ok, one solution would be:
function open_tags($page)
{
$arr=array();
$page // your html/xml/somthing content
$i=0;
while ($i<strlen($page))
{
$i=strpos($page,'<',$i); //position of starting the tag
$end=strpos($page,'>',$i); //position of ending the tag
if(strpos($page,'/')<$end) //if it's an end tag
{
if (array_pop($arr)!=substr($page,$i,$end-$i)); // pop the last value inserted into the stack, and check if it's the same as this one
return FALSE;
}
else
{
array_push($arr,substr($page,$i,$end-$i)); // push the new tag value into the stack
}
}
return $arr;
}
this will return opened tags by order, or false if error.
edit:
function open_tags($page)
{
$arr=array();
$page // your html/xml/somthing content
$i=0;
while ($i<strlen($page))
{
$i=strpos($page,'<',$i); //position of starting the tag
$end=strpos($page,'>',$i); //position of ending the tag
if($end>strpos($page,'<',$i))
return false;
if(strpos($page,'/')<$end) //if it's an end tag
{
if (array_pop($arr)!=substr($page,$i,$end-$i)); // pop the last value inserted into the stack, and check if it's the same as this one
return FALSE;
}
else
{
array_push($arr,substr($page,$i,$end-$i)); // push the new tag value into the stack
}
}
return $arr;
}
Upvotes: -1
Reputation: 48091
I wrote this handy functions. I think it could be faster if I search both opened/closed tags within one preg_match_all but as this it's more readable:
<?php
//> Will count number of <[a-z]> tag and </[a-z]> tag (will also validate the order)
//> Note br should be in the form of <br /> for not causing problems
function validHTML($html,$checkOrder=true) {
preg_match_all( '#<([a-z]+)>#i' , $html, $start, PREG_OFFSET_CAPTURE );
preg_match_all( '#<\/([a-z]+)>#i' , $html, $end, PREG_OFFSET_CAPTURE );
$start = $start[1];
$end = $end[1];
if (count($start) != count($end) )
throw new Exception('Check numbers of tags');
if ($checkOrder) {
$is = 0;
foreach($end as $v){
if ($v[0] != $start[$is][0] || $v[1] < $start[$is][1] )
throw new Exception('End tag ['.$v[0].'] not opened');
$is++;
}
}
return true;
}
//> Usage::
try {
validHTML('<p>hello</p><li></li></p><p>');
} catch (Exception $e) {
echo $e->getMessage();
}
Note if you need to catch even h1 or any other tag with numbers you need to add 0-9 within pattern of preg
Upvotes: 2
Reputation: 318498
The proper way to validate HTML is using a HTML parser. Using Regexes to deal with HTML is very wrong - see RegEx match open tags except XHTML self-contained tags
Upvotes: 1