Reputation: 657
I am trying to scrape a website and the content of the html looks something like this
<div class="panel-heading" role="tab" id="heading727654">
<h4 class="panel-title">
<a class="collapsed" data-toggle="collapse" data-parent="#accordion" href="#collapse727654" aria-expanded="false" aria-controls="collapse727654">
<div class="product-name">
<span class="product-title">
Aubrey<br><i>AGE DEFYING THERAPY CLEANSER 3.4 OZ</i>
</span>
</div>
<div class="product-price">
<span>
$10.99 / 3.40 OZ
</span>
</a>
</h4>
</div>
<div class="panel-heading" role="tab" id="heading727655">
<h4 class="panel-title">
<a class="collapsed" data-toggle="collapse" data-parent="#accordion" href="#collapse727655" aria-expanded="false" aria-controls="collapse727654">
<div class="product-name">
<span class="product-title">
Aubrey<br><i>AGE DEFYING THERAPY LIQUID</i>
</span>
</div>
<div class="product-price">
<span>
$12.99 / 4.40 OZ
</span>
</a>
</h4>
</div>
My python code snippet to extract this is something like
def parse(self, response):
filename = response.url.split("/")[-2] + '.html'
with open(filename, 'wb') as f:
for node in response.xpath('//div[re:test(@class, "panel-heading")]'):
print node.xpath('//span[re:test(@class, "product-title")]//text()').extract()
print node.xpath('//span[re:test(@class, "product-price")]//text()').extract()
When I run the above scrapy code in Python, I am not getting the expected output, the same content is being repeated 100 times. Can someone help me with this?
Upvotes: 3
Views: 1463
Reputation: 474191
You need to prepend dots to your inner XPath expressions to make them work in the context of node
. Otherwise the search starts from the root of the tree:
def parse(self, response):
filename = response.url.split("/")[-2] + '.html'
with open(filename, 'wb') as f:
for node in response.xpath('//div[re:test(@class, "panel-heading")]'):
print node.xpath('.//span[re:test(@class, "product-title")]//text()').extract()
print node.xpath('.//span[re:test(@class, "product-price")]//text()').extract()
Upvotes: 4