Reputation: 2625
For now I got:
public static function splitContent($string, $lenght, $maxCols){
if (strlen($string)<($lenght*$maxCols) && strlen($string)> $lenght){
$string = wordwrap($string, $lenght, "||"); //assume your string doesn't contain `||`
$parts = explode("||", $string);
$result='';
foreach ($parts as $part){
$result=$result.'<div>'.$part.'</div>';
}
return $result;
}
return $string;
}
and it works well when it comes to not breaking words but it often split HTML formatting tags like <span </div><div> style=....>
how to prevent that? I see there is many problems like this when splitting html formatted string. Does anyone know about library to do it without hassle. it would be great if it would count only visible characters
Upvotes: 3
Views: 3535
Reputation: 1694
I had to split any random HTML text into 2 equal parts to display them in 2 columns next to each other.
The logic below splits the HTML into 2 parts taking into account the word boundaries and the HTML tags. You can extend it splitting the HTML into multiple divs with a bit more effort.
I have used @jave.web's logic to close the undisclosed HTML tags.
// splitHtmlTextIntoTwoEqualColumnsTrait.php
<?php
/**
* TCPDF doesn't support to have a 2 columns text where the length of the text is limited and the height of the 2 columns are equal.
*
* This trait calculates the middle of the text, split it into 2 parts and returns with them
* Keeps the word boundaries and takes care of the HTML tags too! There is no broken HTML tag after the split.
*/
trait splitHtmlTextIntoTwoEqualColumnsTrait
{
protected function splitHtmlTextIntoTwoEqualColumns(string $htmlText): array
{
// removes unnecessary characters and HTML tags
$htmlText = str_replace("\xc2\xa0", ' ', $htmlText);
$htmlText = html_entity_decode($htmlText);
$pureText = $this->getPureText($htmlText);
// calculates the length of the text
$fullLength = strlen($pureText);
$halfLength = ceil($fullLength / 2);
$words = explode(' ', $pureText);
// finds the word which is in the middle of the text
$middleWordPosition = $this->getPositionOfMiddleWord($words, $halfLength);
// iterates through the HTML and split the text into 2 parts when it reaches the middle word.
$columns = $this->splitHtmlStringInto2Strings($htmlText, $middleWordPosition);
return $this->closeUnclosedHtmlTags($columns, $halfLength * 2);
}
private function getPureText(string $htmlText): string
{
$pureText = strip_tags($htmlText);
$pureText = preg_replace('/[\x00-\x1F\x7F]/', '', $pureText);
return str_replace(["\r\n", "\r", "\n"], ['', '', ''], $pureText);
}
/**
* finds the word which is in the middle of the text
*/
private function getPositionOfMiddleWord(array $words, int $halfLength): int
{
$wordPosition = 0;
$stringLength = 0;
for ($p = 0; $p < count($words); $p++) {
$stringLength += mb_strlen($words[$p], 'UTF-8') + 1;
if ($stringLength > $halfLength) {
$wordPosition = $p;
break;
}
}
return $wordPosition;
}
/**
* iterates through the HTML and split the text into 2 parts when it reaches the middle word.
*/
private function splitHtmlStringInto2Strings(string $htmlText, int $wordPosition): array
{
$columns = [
1 => '',
2 => '',
];
$columnId = 1;
$wordCounter = 0;
$inHtmlTag = false;
for ($s = 0; $s <= strlen($htmlText) - 1; $s++) {
if ($inHtmlTag === false && $htmlText[$s] === '<') {
$inHtmlTag = true;
}
if ($inHtmlTag === true) {
$columns[$columnId] .= $htmlText[$s];
if ($htmlText[$s] === '>') {
$inHtmlTag = false;
}
} else {
if ($htmlText[$s] === ' ') {
$wordCounter++;
}
if ($wordCounter > $wordPosition && $columnId < 2) {
$columnId++;
$wordCounter = 0;
}
$columns[$columnId] .= $htmlText[$s];
}
}
return array_map('trim', $columns);
}
private function closeUnclosedHtmlTags(array $columns, int $maxLength): array
{
$column1 = $columns[1];
$unclosedTags = $this->getUnclosedHtmlTags($columns[1], $maxLength);
foreach (array_reverse($unclosedTags) as $tag) {
$column1 .= '</' . $tag . '>';
}
$column2 = '';
foreach ($unclosedTags as $tag) {
$column2 .= '<' . $tag . '>';
}
$column2 .= $columns[2];
return [$column1, $column2];
}
/**
* https://stackoverflow.com/a/26175271/5356216
*/
private function getUnclosedHtmlTags(string $html, int $maxLength = 250): array
{
$htmlLength = strlen($html);
$unclosed = [];
$counter = 0;
$i = 0;
while (($i < $htmlLength) && ($counter < $maxLength)) {
if ($html[$i] == "<") {
$currentTag = "";
$i++;
if (($i < $htmlLength) && ($html[$i] != "/")) {
while (($i < $htmlLength) && ($html[$i] != ">") && ($html[$i] != "/")) {
$currentTag .= $html[$i];
$i++;
}
if ($html[$i] == "/") {
do {
$i++;
} while (($i < $htmlLength) && ($html[$i] != ">"));
} else {
$currentTag = explode(" ", $currentTag);
$unclosed[] = $currentTag[0];
}
} elseif ($html[$i] == "/") {
array_pop($unclosed);
do {
$i++;
} while (($i < $htmlLength) && ($html[$i] != ">"));
}
} else {
$counter++;
}
$i++;
}
return $unclosed;
}
}
how to use it:
// yourClass.php
<?php
declare(strict_types=1);
class yourClass
{
use splitHtmlTextIntoTwoEqualColumnsTrait;
public function do()
{
// your logic
$htmlString = '';
[$column1, $column2] = $this->splitHtmlTextIntoTwoEqualColumns($htmlString);
}
}
Upvotes: 1
Reputation: 15032
From what I know this can not be achieved by simple string splitting because as you already found out - there is a very high possibility of breaking html.
However you could:
As for visible characters - PHP itself doesn't know what CSS your elements have - but e.g. if you would load it as an object you could getAttribute('style')
and search your "hide css" in that :)
Note: both cases 1) and 2) requires a bit performance, sou if you are applying this to some higher traffic site you should consider some kind of caching for these results.
NOTE: this function assumes XHTML ! (expects selfclosing tags as <img>
to be selfeclosed as <img />
And please note that I just made this quick so it might not be best nor efficiant way to do it :)
You can see it work at http://ideone.com/erSDlg
//PHP
function closeTags( &$html, $length = 20 ){
$htmlLength = strlen($html);
$unclosed = array();
$counter = 0;
$i=0;
while( ($i<$htmlLength) && ($counter<$length) ){
if( $html[$i]=="<" ){
$currentTag = "";
$i++;
if( ($i<$htmlLength) && ($html[$i]!="/") ){
while( ($i<$htmlLength) && ($html[$i]!=">") && ($html[$i]!="/") ){
$currentTag .= $html[$i];
$i++;
}
if( $html[$i] == "/" ){
do{ $i++; } while( ($i<$htmlLength) && ($html[$i]!=">") );
} else {
$currentTag = explode(" ", $currentTag);
$unclosed[] = $currentTag[0];
}
} elseif( $html[$i]=="/" ){
array_pop($unclosed);
do{ $i++; } while( ($i<$htmlLength) && ($html[$i]!=">") );
}
} else{
$counter++;
}
$i++;
}
$result = substr($html, 0, $i-1);
$unclosed = array_reverse( $unclosed );
foreach( $unclosed as $tag ) $result .= '</'.$tag.'>';
print_r($result);
}
$html = "<div>123890<span>1234<img src='i.png' /></span>567890<div><div style='test' class='nice'>asfaasf";
closeTags( $html, 20 );
Upvotes: 2