Sybghatallah Marwat
Sybghatallah Marwat

Reputation: 312

Split long string into small chunks without breaking HTML tags and words

I am breaking long text into smaller chunks using while loop. My string contains html code and I dont want the user to see those open or closed braces.

my template string contains following text.

var text = "I love Stackoverflow. It helps me lot and Bla bla bla bla bla bla ";

var textString = '<div class="row page col-md-12 "><h4 style="margin-left:20px;"> 
<u> Working from home</u></h4><p style="margin:30px;">'+text+'<p></div>';

I am using the following method

var i = true;
      var start = 0;
      var end = 20;
      var increment = 0;
        var incremented = 0;
       var val1 = textString.slice(start,end);  
        while (i == true) {                                     
             val1 = data.slice(start,end);
                var check  = val1.endsWith(' ');
            while (check == false) {
            end = end+1;
                incremented = incremented+1;
            val1 = data.slice(start,end);
                if(val1.endsWith(' ')){
                check = false;
                }else{
                check = true;
                }
                 end = end+20+incremented;
                 start = start+20+incremented;
                 if(start>textString.length){
                      i=false;
                 }
         }
}

An Example is here:

    var text1 = 'I love Stackoverflow. It helps me lot and Bla bla bla bla bla 
    bla';
    var text2 = 'Some Random Text';
    var text3 = 'Some Random Text';
    var text4 = 'Some Random Text';
    var text5 = 'Some Random Text';
    var text6 = 'Some Random Text';

    var textString = '<div class="row page col-md-12 "><h4 style="margin-left:20px;"> 
    <u> text1 </u></h4><p style="margin:30px;">'+text2+'<p></div>
    <div class="row page col-md-12 "><h4 style="margin-left:20px;"> 
    <u> text3</u></h4><p style="margin:30px;">'+text4+'<p></div>
    <div class="row page col-md-12 "><h4 style="margin-left:20px;"> 
    <u>text5</u></h4><p style="margin:30px;">'+text6+'<p></div>';

and output i need should be like

    arr[0] = ' <div class="row page col-md-12 "><h4 style="margin-left:20px;"> 
    <u> text1</u></h4><p style="margin:30px;">'+text2+'<p></div>';

    arr[1] = '<div class="row page col-md-12 "><h4 style="margin-left:20px;">  
    <u> text3</u></h4><p style="margin:30px;">'+text4+'<p></div>';

    arr[2] = '<div class="row page col-md-12 "><h4 style="margin-left:20px;"> 
    <u> text5</u></h4><p style="margin:30px;">'+text6+'<p></div>';

This is my Current output: enter image description here

Upvotes: 1

Views: 1776

Answers (2)

Massaynus
Massaynus

Reputation: 442

you could split the string using spaces

let wordsArray = text.split(" ")

then reduce is to whatever chunks you want

let chunks = Array()
const wordsInChunkCount = 100
let temp = wordsInChunkCount
let str = ''
wordsArray.forEach(item => {
  if (temp > 0) {
    str += ' ' + item
    temp--
  } else {
    chunks.push(str)
    str = ''
    temp = wordsInChunkCount
  }
})

after that you will have your chunks in the chunks array

Upvotes: 0

Ralph Ritoch
Ralph Ritoch

Reputation: 3440

HTML DOM nodes include their content so you can't split them without breaking them. The following code will convert your string into a DOM tree. Split off all the child nodes and re-combine them without breaking words or HTML based on the length of their text content.

If your data is bad and, for example, has a single paragraph that takes up more than one page, or a long series of letters with no spaces, than it is likely you will need to come up with custom solutions for each type of HTML tag and long series of characters.

Even with this solution you may find that additional effort is need to keep pre tags within your page targets.

This function takes two arguments, your string and the maximum length you would like for the textContent in characters.

var shard = function(str, len) {

    var el = document.createElement('div');
    el.innerHTML = str;
    var child = el.firstChild;

    var parts = [];
        while(child) { 
          if (child.nodeType == 3) {
            var texts = child.nodeValue.split('')
              .reduce(function(a,b){ 
                 if (b.split(/\s/).length > 1) { 
                    a[a[a.length-1].length > 0 ? a.length: a.length - 1] = b; 
                    a[a.length] = ''
                 } else { 
                    a[a.length - 1] = a[a.length - 1] + b;
                 } return a; },['']);
            for(var idx=0; idx<texts.length; idx++) {
                parts.push(document.createTextNode(texts[idx]));
            }
          } else {
            parts.push(child);
          } 
          child = child.nextSibling; 
        }        

    var textParts = parts.map(function(el) { return el.textContent; });

    
    var partsOut = [''];

    var t = 0;

    for(var idx=0; idx<parts.length; idx++) {

        if ((t + textParts[idx].length) > len) {
          partsOut[partsOut.length] = parts[idx].nodeType == 3 ? 
              parts[idx].nodeValue : parts[idx].outerHTML;
          t = textParts[idx].length;
        } else {
          partsOut[partsOut.length - 1] = partsOut[partsOut.length - 1] + (
             parts[idx].nodeType == 3 ? 
             parts[idx].nodeValue : 
             parts[idx].outerHTML
         );
          t += textParts[idx].length;
        }

        
    }

    return partsOut;

};

This is probably not what you want to use in a production environment but it does make an attempt, where possible, to break up HTML into unbroken pieces with a target length of the text content.

Upvotes: 2

Related Questions