Matching multiple
tags in scrapy

Question

I have something like the following html:


  
    Text lorem ipsum... 
    lorem ipsum...
  
  lorem ipsum 
     lorem ipsum lorem ipsum
    lorem ipsum...lorem ipsum...lorem ipsum...lorem ipsum...

In a more general way, I have a list of

tags with a few tags inside.

I would like to get the text of all the
tags, minus the tags... and by that, I mean just the text in the "articleBody" div class.

What I have is

response.xpath('string(//div[@class="articleBody"]//p)'.extract()

but that only returns the first
.

Any help would be appreciated.

KorreyD · Accepted Answer

give this a shot:

for node in response.xpath('//div[@class="articleBody"]//p'):
        print node.xpath('string()').extract()

...then you can concatenate your strings or add them to a list or whatever instead of just printing them like I did.

there is also the string-join() function for xpath 2.0 but it looks like scrapy supports xpath 1.0.

more info about string-join and such here: http://www.w3.org/TR/xpath-functions/#func-string-join

Matching multiple <p> tags in scrapy

Answers (1)

Related Questions

Matching multiple &lt;p&gt; tags in scrapy

Answers (1)

Related Questions

Matching multiple <p> tags in scrapy