Tyler Carter
Tyler Carter

Reputation: 61557

PHP Based HTML Validator

I need to find a PHP based HTML (as in WC3-Like) Validator that can look for invalid HTML or XHTML. I've searched Google a little, but was curious if anyone has used one they particularly liked.

I have the HTML in a string:

$html = "<html><head>.....</body></html>";

And I would like to be able to test the page, and have it return the errors. (Not echo/print anything)

I've seen:
-http://www.bermi.org/xhtml_validator
-http://twineproject.sourceforge.net/doc/phphtml.html

The background for this is that I'd like to have a function/class that I run on every page, check if the file has been modified since the last access date (or something similar to that), and if it hasn't, run the validator so I am immediately notified of invalid HTML while coding.

Upvotes: 4

Views: 10530

Answers (4)

Nikos M.
Nikos M.

Reputation: 8325

I had a case where I needed to check partial html code for unmatched and malformed tags (mostly, eg </br>, a common error in my samples) and various heavy-duty validators were too much to use. So I ended up making my own custom validation routine in PHP, it is pasted below (you may need to use mb_substr instead of index-based character retrieval if you have text in different languages) (note it does not parse CDATA or script/style tags but can be extended easily):

function check_html( $html )
{
    $stack = array();
    $autoclosed = array('br', 'hr', 'input', 'embed', 'img', 'meta', 'link', 'param', 'source', 'track', 'area', 'base', 'col', 'wbr');
    $l = strlen($html); $i = 0;
    $incomment = false; $intag = false; $instring = false;
    $closetag = false; $tag = '';
    while($i<$l)
    {
        while($i<$l && preg_match('#\\s#', $c=$html[$i])) $i++;
        if ( $i >= $l ) break;
        if ( $incomment && ('-->' === substr($html, $i, 3)) )
        {
                // close comment
                $incomment = false;
                $i += 3;
                continue;
        }
        $c = $html[$i++];
        if ( '<' === $c )
        {
            if ( $incomment ) continue;
            if ( $intag )  return false;
            if ( '!--' === substr($html, $i, 3) )
            {
                // open comment
                $incomment = true;
                $i += 3;
                continue;
            }

            // open tag
            $intag = true;
            if ( '/' === $html[$i] )
            {
                $i++;
                $closetag = true;
            }
            else
            {
                $closetag = false;
            }
            $tag = '';
            while($i<$l && preg_match('#[a-z0-9\\-]#i', $c=$html[$i]) )
            {
                $tag .= $c;
                $i++;
            }
            if ( !strlen($tag) ) return false;
            $tag = strtolower($tag);
            if ( $i<$l && !preg_match('#[\\s/>]#', $html[$i]) ) return false;
            if ( $i<$l && $closetag && preg_match('#^\\s*/>#sim', substr($html, $i)) ) return false;
            if ( $closetag )
            {
                if ( in_array($tag, $autoclosed) || (array_pop($stack) !== $tag) )
                    return false;
            }
            else if ( !in_array($tag, $autoclosed) )
            {
                $stack[] = $tag;
            }
        }
        else if ( '>' ===$c )
        {
            if ( $incomment ) continue;
            
            // close tag
            if ( !$intag ) return false;
            $intag = false;
        }
    }
    return !$incomment && !$intag && empty($stack);
}

Upvotes: -2

Pons
Pons

Reputation: 1776

If you can't use Tidy (sometimes hosting service do not activate this php module), you can use this PHP class: http://www.barattalo.it/html-fixer/

Upvotes: 0

Byron Whitlock
Byron Whitlock

Reputation: 53861

While it isn't strictly PHP, (it is a executable) one i really like is w3c's HTML tidy. it will show what is wrong with the HTML, and fix it if you want it to. It also beautifies HTML so it doesn't look like a mess. runs from the command line and is easy to integrate into php.

check it out. http://www.w3.org/People/Raggett/tidy/

Upvotes: 2

Robert Elwell
Robert Elwell

Reputation: 6668

There's no need to reinvent the wheel on this one. There's already a PEAR library that interfaces with the W3C HTML Validator API. They're willing to do the work for you, so why not let them? :)

Upvotes: 6

Related Questions