djohon
djohon

Reputation: 879

Scrapy: how to get the text of a tag inside another tag

I have html paragraphs like this:

<p>Hello <strong>I'm G </strong></p>

I'm trying to get all the text inside the p. Even the part in strong tag. I tried the code below but i only get "Hello".:

for text in response.css("div.entry-content"):
        yield {
            "parag": text.css("p::text").extract(),
        }

I also tried first-child like in css but this time nothing returned:

"parag": text.css("p:strong::text").extract()

Edit: Instead of strong, it could be another tag. So the goal would be to get the first child text

Upvotes: 1

Views: 1310

Answers (1)

JkShaw
JkShaw

Reputation: 1947

Here's a working example:

>>> from scrapy.http import HtmlResponse
>>> response = HtmlResponse(url="Test HTML String", body="<p>Hello <strong>I'm G </strong> <b>I write code</b></p>")

# First child
>>> ' '.join(t.strip() for i, t in enumerate(response.css('p ::text').extract()) if i< 2).strip()
u"Hello I'm G"

# All child
>>> ' '.join(t.strip() for t in response.css('p ::text').extract()).strip()
u"Hello I'm G  I write code"

Upvotes: 4

Related Questions