Scrapy - getting HTML without outer tag

Question

I'm scraping a page, using Scrapy. I want the HTML contents of the TD with "text" class:


  
    A bunch of HTML

    
      Some random text

This is my Scrapy line:

for body in response.css('td.text'):
  yield {'body': body.extract()}

Which works - except it includes the surrounding td:

[
  {"body": " A bunch of HTML  Some random text
  "}
]

This is what I want:

[
  {"body": "A bunch of HTML  Some random text
 "}
]

Halp? :)

Mohamed Yasser · Accepted Answer

Try this selector:

response.css('td.text *')

The * will select all inner tags.

Answers (2)