Bruno Le Duic
Bruno Le Duic

Reputation: 261

Limit input length of text that contains HTML tags

I have a php web sites in wich I can manages articles. On the Add a new article form, there is a rich-text-box (allows HTML input) that I'd like to limit the character input count. I check on the server side so using the strlen()­Docs method.

The problem is strlen seems to give a number that is way too big. I tried to use html_entity_decode()­Docs to get the html tags out of the string but still the string length resulting seems to be wrong.

Upvotes: 2

Views: 3957

Answers (2)

Alix Axel
Alix Axel

Reputation: 154563

html_entity_decode only decodes HTML entities, it doesn't ignore HTML tags. Try:

strlen(strip_tags(html_entity_decode($string)));

Or the multi-byte equivalent:

mb_strlen(strip_tags(html_entity_decode($string)), 'auto');

Upvotes: 6

hakre
hakre

Reputation: 197832

You want to get the number of characters, but you don't want to count HTML markup.

You can do that by using a HTML parser, like DOMDocument. You load in the document (or fragment), obtain the body tag which represents the documents content, get it's nodeValue, normalize the whitespace of it and then you use a UTF-8 compatible character counting function:

$doc = new DOMDocument();
$doc->loadHTMLFile('test.html');
$body = $doc->getElementsByTagName('body')->item(0);
$text = $body->nodeValue;
$text = trim(preg_replace('/\s{1,}/u', ' ', $text));
printf("Length: %d character(s).\n", mb_strlen($text, 'utf-8'));

Example input test.html:

<body>
    <div style='float:left'><img src='../../../../includes/ph1.jpg'></div>

    <label style='width: 476px; height: 40px; position: absolute;top:100px; left: 40px; z-index: 2; background-color: rgb(255, 255, 255);; background-color: transparent' >
    <font size="4">1a. Nice to meet you!</font>
    </label>
    <img src='ENG_L1_C1_P0_1.jpg' style='width: 700px; height: 540px; position: absolute;top:140px; left: 40px; z-index: 1;' />

    <script type='text/javascript'> 


    swfobject.registerObject('FlashID');
    </script>

    <input type="image" id="nextPageBtn" src="../../../../includes/ph4.gif" style="position: absolute; top: 40px; left: 795px; ">

</body>

Example output:

Length: 58 character(s).

The normalized text is:

1a. Nice to meet you! swfobject.registerObject('FlashID');

Take care that this counts the text-size including things like text inside <script> tags.

Upvotes: 1

Related Questions