FlyingCat
FlyingCat

Reputation: 14270

How to extract texts from html markup

I am trying to get every texts in my html data which is input by the user

I have html like the following

  <em>first part</em> of texts here

    <table>
    ......
    ......
    </table>

<em>second part</em> of texts

I use jquery

project =[];

$(htmlData).contents().each(function(){
     if($(this).is('table')){
        //do something with table
     }else{
        if(this.nodeType === 3) { // Will only select element nodes
                  project.push($(this).text());
            }else if(this.nodeType === 1){
                  project.push(this.outerHTML);
            }
         }
     }

the array ends up like

array(0=>'<em>first part</em>', 2=>'of texts here',3=>'<em>second part</em>',4=>'of texts')

I was hoping to get an array like the following

array(0=>'<em>first part</em>of texts here',1=>'<em>second part</em>of texts');

How do I accomplish this? Thanks for the help!

Upvotes: 4

Views: 124

Answers (2)

Oriol
Oriol

Reputation: 288550

DEMO: http://jsfiddle.net/Cbey9/2/

var project =[];

$('#htmlData').contents().each(function(){
    if($(this).is('table')){
        //do something with table
    }else{
        var txt = (
                this.nodeType === 3  ?  $(this).text()  :
                (this.nodeType === 1  ?  this.outerHTML  :  '')
            ).replace(/\s+/g,' ') // Collapse whitespaces
            .replace(/^\s/,'') // Remove whitespace at the beginning
            .replace(/\s$/,''); // Remove whitespace at the end
        if(txt !== ''){ // Ignore empty
            project.push(txt);
        }
    }
});

I understood bad your problem. If you want to split at tables, then you could use

var project =[''];

$('#htmlData').contents().each(function(){
    if($(this).is('table')){
        project.push('');
        //do something with table
    }else{
        project[project.length-1] += (
            this.nodeType === 3  ?  $(this).text()  :
            (this.nodeType === 1  ?  this.outerHTML  :  '')
        );
    }
});
for(var i=0; i<project.length; ++i){
    project[i] = project[i].replace(/\s+/g,' ') // Collapse whitespaces
    .replace(/^\s/,'') // Remove whitespace at the beginning
    .replace(/\s$/,''); // Remove whitespace at the end
}

DEMO: http://jsfiddle.net/Cbey9/3/

Upvotes: 1

Daniel Gabado
Daniel Gabado

Reputation: 284

Place the texts you want inside spans with some specific class (won't alter layout):

<span class="phrase"><em>first part</em> of texts here</span>

    <table>
    ......
    ......
    </table>

<span class="phrase"><em>second part</em> of texts</span>

And then you can get them:

$('span.phrase').each(function() {
    project.push($(this).html());
});

Upvotes: 1

Related Questions