Reputation: 31
I am scraping a website using scrapy where I want to extract a few details such as price, product description, features etc of a product. I want to know how to select each of these elements using css selectors or xpath selectors and store them in xml or json format.
I have written the following code skeleton. Please guide me what should I do from here.
# -*- coding: utf-8 -*-
import scrapy
import time
class QuotesSpider(scrapy.Spider):
name = 'myquotes'
start_urls = [
'https://www.amazon.com/international-sales-offers/b/ref=gbps_ftr_m-9_2862_dlt_LD?node=15529609011&gb_f_deals1=dealStates:AVAILABLE%252CWAITLIST%252CWAITLISTFULL%252CEXPIRED%252CSOLDOUT%252CUPCOMING,sortOrder:BY_SCORE,MARKETING_ID:ship_export,enforcedCategories:15684181,dealTypes:LIGHTNING_DEAL&pf_rd_p=9b8adb89-8774-4860-8b6e-e7cefc1c2862&pf_rd_s=merchandised-search-9&pf_rd_t=101&pf_rd_i=15529609011&pf_rd_m=ATVPDKIKX0DER&pf_rd_r=AA0VVPMWMQM1MF4XQZKR&ie=UTF8'
]
def parse(self, response):
all_div_quotes = response.css('a-section a-spacing-none tallCellView gridColumn2 singleCell')
for quotes in all_div_quotes:
title1 = all_div_quotes.css('.dealPriceText::text').extract()
title2 = all_div_quotes.css('.a-declarative::text').extract()
title3 = all_div_quotes.css('#shipSoldInfo::text').extract()
yield{
'price' : title1,
'details1' : title2,
'details2' : title3
}
I am running the code using the command:
scrapy crawl myquotes -o myfile.json
to save it inside a json file. The problem with this code is that it is not returning the title, product price, product description as intended. If someone could help me with how to scrape the product name, price and description of an amazon page it would be of great help.
Upvotes: 1
Views: 144
Reputation: 815
generallly what you could do is
Name: response.css("#productTitle::text").extract()
Description: response.css("#productDescription p::text").extract()
With this you should be good to go. CSS selector are more constant so they are usually a better bet than using xpath and consequently the way to go
Upvotes: 0
Reputation: 61
The easier way to check and verify CSS selectors is using scrapy shell
.
In your case, I have listed the selectors you can use along with the code:
Name: response.css("#productTitle::text").get()
Price: Price was not available in my country so couldn't test it.
Description: response.css("#productDescription p::text").getall()
Best of luck.
Upvotes: 1
Reputation: 61
The normal method to solve an error like this starting at the top. I think your very first css selector is too detailed. On using the selector gadget, the general css selector is
.dealDetailContainer
Yield the whole response without a for loop and check the output to understand that you're getting some kind of a response.
For products individually, when I scraped a different amazon link the css selector for the product name is
#productTitle::text -># is not a commented line of code here
Basically, you're going wrong with the css selectors. Use the CSS Selector Gadget and before using the command to output it into json, do a normal crawl first.
Upvotes: 0