Blair McMillan
Blair McMillan

Reputation: 5349

Close tags from a truncated HTML string

I have inherited a site with a news section that displays a summary of the news article. For whatever reason the creators decided that displaying the first X characters of the article would be fine. Of course this very quickly led to the summary being something like:

<p>What a mighty fine <a href="blah">da
<p>What a mighty fine and warm <a href="htt
<p>His name was &quot;Emil&qu

Which quite obviously screws with the page, especially when the opening tags aren't even closed.

What I'm after is a way to close all open tags within the string being taken. I really really don't want to use regex to do it. I'm sure there's a nice parser that can do it easily, I just can't seem to find it right now.

Upvotes: 2

Views: 1953

Answers (3)

robjmills
robjmills

Reputation: 18598

Have you taken a look at Tidy?

Example:

$options = array("show-body-only" => true); 
$tidy = tidy_parse_string("<B>Hello</I> How are <U> you?</B>", $options);
tidy_clean_repair($tidy);
echo $tidy;

Outputs:

<b>Hello</b> How are <u>you?</u> 

Upvotes: 2

lonesomeday
lonesomeday

Reputation: 237845

I would install the PHP bindings for Tidy. You can then use this to clean up an HTML fragment using the following code:

<?php

$fragment = '<p>What a mighty fine <a href="blah">da';

$tidy = new tidy();

$tidy->parseString($fragment,array('show-body-only'=>true),'utf8');
$tidy->cleanRepair();

echo $tidy;

Upvotes: 1

Emil Vikstr&#246;m
Emil Vikstr&#246;m

Reputation: 91922

The best thing is probably to find a better algorithm for generating the excerpt, for example by running strip_tags before the truncation.

How will you otherwise handle hard-to-find-programmatically errors such as <p>What a mighty fine and warm <a href="htt or <p>His name was &quot;Emil&qu?

Upvotes: 2

Related Questions