merlin
merlin

Reputation: 2897

How to extract only one element per tag with scrapy?

I am trying to extract text from the page tag <dd></dd> with this command in srapy shell:

[w.strip() for w in response.xpath('//ul[@class="attribute-list"]/li/dl/dd/text()').extract()]

The dd tag looks like this:

<dd> Edelstahl <br>gebürstet (silberfarben) </dd>

scrapy returns:

'Edelstahl', 'gebürstet (silberfarben)', more dd elements...

Now it is important that I get either only the first element "Edelstahl" or both compined "Edelstahl gebürstet (silberfarben)", but in any case not two elements from one dd tag. How can this be achieved?

Upvotes: 1

Views: 62

Answers (2)

Joaquin
Joaquin

Reputation: 2091

You could use:

[w.xpath('string()').extract_first().strip() for w in response.xpath('//ul[@class="attribute-list"]/li/dl/dd')]

Upvotes: 1

vezunchik
vezunchik

Reputation: 3717

Since you have tags in your dd, better to use something like:

from w3lib.html import remove_tags
print [remove_tags(w).strip() for w in response.xpath('//ul[@class="attribute-list"]/li/dl/dd').extract()]

It will give you clear text for each dd element.

Upvotes: 1

Related Questions