Reputation: 197
I have the following issue when trying to get information from some website using scrapy.
I'm trying to get all the text inside <p>
tag, but my problem is that in some cases inside those tags there is not just text, but sometimes also an <a>
tag, and my code stops collecting the text when it reaches that tag.
This is my Xpath expression, it's working properly when there aren't tags contained inside:
description = descriptionpath.xpath("span[@itemprop='description']/p/text()").extract()
Upvotes: 2
Views: 2733
Reputation: 12194
Posting Pawel Miech's comment as an answer as it appears his comment has helped many of us thus far and contains the right answer:
Tack //text()
on the end of the xpath to specify that text should be recursively extracted.
So your xpath would appear like this:
span[@itemprop='description']/p//text()
Upvotes: 3