JVG
JVG

Reputation: 21150

Method that gets an element's child text, regardless of whether in <p> tag

I'm building a scraper in Node.js and have come up against a slight problem. I'm trying to build a function which gets an element's text, regardless of whether it's embedded in a <p> tag, in a <span> or just a <div> with text inside.

The following currently works ONLY for text contained in <p> tags:

function getDescription(product){
    var text =[];
    $('.description *')
        .each(function(i, elem) {
            var dirty = $(this).text();
            var clean = sanitize(dirty).trim();
            if (clean.length){
                text.push(clean);
            }
        });
    text.join(',');
    sanitize(text).trim();
    return text;
}

This works for code like this:

<div class="description">
    <p>Test test test</p>
</div>

But doesn't work for this:

<div class="description">
    Test test test
</div>

For reference, the sanitize and trim functions are part of Node Validator, but that's not particularly relevant to my problem - they just take a string and remove whitespace from it.

Any ideas on what I can do to make the one function work for BOTH instances? To add insult to injury, I'm slightly more limited as node uses the cheerio library to replicate some functions of jQuery, but not all of them.

Upvotes: 1

Views: 235

Answers (3)

epoch
epoch

Reputation: 16615

You can use innerText:

var text =[];
$('.description').each(function(i, elem) {
    var dirty = elem.innerText;

    var clean = sanitize(dirty).trim();
    if (clean.length){
        text.push(clean);
    }
});

Upvotes: 0

Arun P Johny
Arun P Johny

Reputation: 388316

Use .contents() instead of *

function getDescription(product){
    var text =[];
    $('.description').contents()
        .each(function(i, elem) {
            var dirty = $(this).text();
            var clean = sanitize(dirty).trim();
            if (clean.length){
                text.push(clean);
            }
        });
    text.join(',');
    sanitize(text).trim();
    return text;
}

Upvotes: 6

Tomalak
Tomalak

Reputation: 338238

Use $(".description").contents() (docs).

The * only selects element nodes, but not text nodes.

Upvotes: 3

Related Questions