Scrapy: how to extract multiple matching xpaths from one page?

Question

I'm using scrapy to extract product data from a website. One webpage contains multiple products. The html of interest looks like this:

 itemprop="name">Hammer 
       Nice hammer! 


 itemprop="name">Screwdriver 
       Cool screwdriver!

Some products don't have a description and will look like this:

 itemprop="name">Nails

Q: What would my parse method look like, in order to extract the products and their descriptions and store them into an array or file? Where the array would look like this:

array = [["product1","description1"],["product2","description2"], ..., ["productN","descriptionN"]]

I know how to extract an array A that contains just the products and I know how to extract an array B with just the descriptions. However, since there are products without a description, C = A + B would result in mismatches. So I need to find a way to match a product with a description, only if it has one.

alecxe · Accepted Answer

Iterate over products and locate the product names and descriptions:

$ scrapy shell file://$PWD/index.html
In [1]: [
   ...:     (item.css(".productname::text").extract_first(), 
   ...:      item.css(".description::text").extract_first()) 
   ...:     for item in response.css(".product")
   ...: ]
Out[1]: 
[(u'Hammer', u' Nice hammer! '),
 (u'Screwdriver', u'Cool screwdriver!'),
 (u'Nails', None)]

Note the None description value if it is not present.

Working with this HTML sample based on your examples:


    
      Hammer
       Nice hammer! 
    

    
          Screwdriver
          Cool screwdriver!
    

    
      Nails

Scrapy: how to extract multiple matching xpaths from one page?

Answers (1)

Related Questions