Reputation: 2897
I am trying to extract text from the page tag <dd></dd>
with this command in srapy shell:
[w.strip() for w in response.xpath('//ul[@class="attribute-list"]/li/dl/dd/text()').extract()]
The dd tag looks like this:
<dd> Edelstahl <br>gebürstet (silberfarben) </dd>
scrapy returns:
'Edelstahl', 'gebürstet (silberfarben)', more dd elements...
Now it is important that I get either only the first element "Edelstahl" or both compined "Edelstahl gebürstet (silberfarben)", but in any case not two elements from one dd tag. How can this be achieved?
Upvotes: 1
Views: 62
Reputation: 2091
You could use:
[w.xpath('string()').extract_first().strip() for w in response.xpath('//ul[@class="attribute-list"]/li/dl/dd')]
Upvotes: 1
Reputation: 3717
Since you have tags in your dd
, better to use something like:
from w3lib.html import remove_tags
print [remove_tags(w).strip() for w in response.xpath('//ul[@class="attribute-list"]/li/dl/dd').extract()]
It will give you clear text for each dd
element.
Upvotes: 1