Ben
Ben

Reputation: 25807

Which function in php validate if the string is valid html?

Which function in php validate if the string is html? My target to take input from user and check if input html and not just string.

Example for not html string:

sdkjshdk<div>jd</h3>ivdfadfsdf or sdkjshdkivdfadfsdf

Example for html string:

<div>sdfsdfsdf<label>dghdhdgh</label> fdsgfgdfgfd</div>

Thanks

Upvotes: 7

Views: 32849

Answers (7)

you should use:

$html="<html><body><p>This is array.</p><br></body></html>";

libxml_use_internal_errors(true);
$dom = New DOMDocument();
$dom->loadHTML($html);
if (empty(libxml_get_errors())) {
  echo "This is a good HTML";
}else {
  echo "This not html";
}

Upvotes: 2

Diogo Gomes
Diogo Gomes

Reputation: 2265

simplexml_load_string will fail if you don't have a single root node. So if you try this html:

<p>A</p><p>B</p> it will be invalid.

Here's my function:

function check($string){
    $start = strpos($string, '<');
    $end = strrpos($string, '>', $start);

    if ($end !== false) {
        $string = substr($string, $start);
    } else {
        $string = substr($string, $start, strlen($string) - $start);
    }

    // xml requires one root node
    $string = "<div>$string</div>";

    libxml_use_internal_errors(true);
    libxml_clear_errors();
    simplexml_load_string($string);

    return count(libxml_get_errors()) == 0;
}

Upvotes: 3

Eineki
Eineki

Reputation: 14909

Maybe you need to check if the string is well formed.

I would use a function like this

function check($string) {
  $start =strpos($string, '<');
  $end  =strrpos($string, '>',$start);

  $len=strlen($string);

  if ($end !== false) {
    $string = substr($string, $start);
  } else {
    $string = substr($string, $start, $len-$start);
  }
  libxml_use_internal_errors(true);
  libxml_clear_errors();
  $xml = simplexml_load_string($string);
  return count(libxml_get_errors())==0;
}

Just a warning: html permits unbalanced string like the following one. It is not an xml valid chunk but it is a legal html chunk

<ul><li>Hi<li> I'm another li</li></ul>

Disclaimer I've modified the code (without testing it). in order to detect well formed html inside the string.

A last though Maybe you should use strip_tags to control user input (As I've seen in your comments)

Upvotes: 11

Ali Asgari
Ali Asgari

Reputation: 840

If you want to make your site secure also, you certainly have to use an HTML purifier like htmlpurifier, tidy etc.

Upvotes: 0

Iznogood
Iznogood

Reputation: 12843

Are you trying to prevent users from posting html tags instead of strings? Cause if this is what you want to do you just need striptags()

Wich will remove any html tags from the string.

Upvotes: 2

a1ex07
a1ex07

Reputation: 37364

You can use DomDocument's method loadHTML

Upvotes: 5

CaseySoftware
CaseySoftware

Reputation: 3125

Do you mean HTML or XHTML?

The HTML standard and interpretation are so loose that your first snippet might work. It won't be pretty but you might get something.

XHTML is quite a bit more strict and at minimum will expect your snippet to be well-formed (all opened tags are closed; tags can nest but not overlap) and may throw warnings if you have unrecognized elements or attributes.

Something like Tidy - http://php.net/manual/en/book.tidy.php - is probably a good start. Once you load your snippet using that, you can use tidy_error_count or tidy_get_error_buffer to see if it's "okay enough" for your needs.

Upvotes: 2

Related Questions