Reputation: 31

How to select various elements of a website

I am scraping a website using scrapy where I want to extract a few details such as price, product description, features etc of a product. I want to know how to select each of these elements using css selectors or xpath selectors and store them in xml or json format.

I have written the following code skeleton. Please guide me what should I do from here.

# -*- coding: utf-8 -*-

import scrapy
import time


class QuotesSpider(scrapy.Spider):
    name = 'myquotes'
    
    start_urls = [
            'https://www.amazon.com/international-sales-offers/b/ref=gbps_ftr_m-9_2862_dlt_LD?node=15529609011&gb_f_deals1=dealStates:AVAILABLE%252CWAITLIST%252CWAITLISTFULL%252CEXPIRED%252CSOLDOUT%252CUPCOMING,sortOrder:BY_SCORE,MARKETING_ID:ship_export,enforcedCategories:15684181,dealTypes:LIGHTNING_DEAL&pf_rd_p=9b8adb89-8774-4860-8b6e-e7cefc1c2862&pf_rd_s=merchandised-search-9&pf_rd_t=101&pf_rd_i=15529609011&pf_rd_m=ATVPDKIKX0DER&pf_rd_r=AA0VVPMWMQM1MF4XQZKR&ie=UTF8'
                        
    ]

    def parse(self, response):
        
        
        all_div_quotes = response.css('a-section a-spacing-none tallCellView gridColumn2 singleCell')                    
        
        for quotes in all_div_quotes:
            
            
            title1 = all_div_quotes.css('.dealPriceText::text').extract()
            title2 = all_div_quotes.css('.a-declarative::text').extract()
            title3 = all_div_quotes.css('#shipSoldInfo::text').extract()        
        
            
        yield{
                'price' : title1,
                'details1' : title2,
                'details2' : title3                                  
                
            }

I am running the code using the command:

scrapy crawl myquotes -o myfile.json

to save it inside a json file. The problem with this code is that it is not returning the title, product price, product description as intended. If someone could help me with how to scrape the product name, price and description of an amazon page it would be of great help.

Upvotes: 1

Answers (3)

maestro.inc

Reputation: 815

generallly what you could do is

Name: response.css("#productTitle::text").extract()

Description: response.css("#productDescription p::text").extract()

With this you should be good to go. CSS selector are more constant so they are usually a better bet than using xpath and consequently the way to go

Upvotes: 0

ibraheem-nadeem

Reputation: 61

The easier way to check and verify CSS selectors is using scrapy shell. In your case, I have listed the selectors you can use along with the code:

Name: response.css("#productTitle::text").get()

Price: Price was not available in my country so couldn't test it.

Description: response.css("#productDescription p::text").getall()

Best of luck.

Upvotes: 1

Harshad

Reputation: 61

The normal method to solve an error like this starting at the top. I think your very first css selector is too detailed. On using the selector gadget, the general css selector is

.dealDetailContainer

Yield the whole response without a for loop and check the output to understand that you're getting some kind of a response.

For products individually, when I scraped a different amazon link the css selector for the product name is

#productTitle::text  -># is not a commented line of code here

Basically, you're going wrong with the css selectors. Use the CSS Selector Gadget and before using the command to output it into json, do a normal crawl first.

Upvotes: 0

How to select various elements of a website

Answers (3)

Related Questions