kevin
kevin

Reputation: 2014

Web scraping - Python

How can I extract the entire content within "td"?

<td>
    Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! 
    <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
</td>

I tried this:

desc = data.xpath("//td/text()") 
print desc

But, it returns the first sentence only:

Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! 

I would like to have the output in the following format:

Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! 8 entire dolls per set! Octuple the presents!

I also tried:

desc = data.xpath("//td//text()") 
    print desc

The output looks like this:

Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! 
8 entire dolls per set! Octuple the presents!

I prefer the following:

Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! 8 entire dolls per set! Octuple the presents!

Upvotes: 2

Views: 163

Answers (1)

kevin
kevin

Reputation: 2014

This worked.

desc = data.xpath("//td") 
    print desc.text_content()

Upvotes: 2

Related Questions