How to iterate through nodes in scrapy using python

Question

I am trying to scrape a website and the content of the html looks something like this


            
                
                    
                        
                            Aubrey
AGE DEFYING THERAPY CLEANSER 3.4 OZ
                        
                    
                    
                        
                            $10.99 / 3.40 OZ 
                        
                
            


            
                
                    
                        
                            Aubrey
AGE DEFYING THERAPY LIQUID
                        
                    
                    
                        
                            $12.99 / 4.40 OZ

My python code snippet to extract this is something like

def parse(self, response):
        filename = response.url.split("/")[-2] + '.html'
        with open(filename, 'wb') as f:
            for node in response.xpath('//div[re:test(@class, "panel-heading")]'):
                print node.xpath('//span[re:test(@class, "product-title")]//text()').extract()
                print node.xpath('//span[re:test(@class, "product-price")]//text()').extract()

When I run the above scrapy code in Python, I am not getting the expected output, the same content is being repeated 100 times. Can someone help me with this?

alecxe · Accepted Answer

You need to prepend dots to your inner XPath expressions to make them work in the context of node. Otherwise the search starts from the root of the tree:

def parse(self, response):
    filename = response.url.split("/")[-2] + '.html'
    with open(filename, 'wb') as f:
        for node in response.xpath('//div[re:test(@class, "panel-heading")]'):
            print node.xpath('.//span[re:test(@class, "product-title")]//text()').extract()
            print node.xpath('.//span[re:test(@class, "product-price")]//text()').extract()

How to iterate through nodes in scrapy using python

Answers (1)

Related Questions