Reputation: 11
I try to get data from this page https://octopart.com/electronic-parts/integrated-circuits-ics but from the Specs button. I try to get the names of the products with this code, but it doesn't work.
class SpecSpider(scrapy.Spider):
name='specName'
start_urls = ['https://octopart.com/electronic-parts/integrated-circuits-ics']
custom_settings = {
'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter',
}
def parse(self,response):
return FormRequest.from_response(response, formxpath="//form[@class='btn-group']", clickdata={"value":"serp-grid"}, callback = self.scrape_pages)
def scrape_pages(self, response):
#open_in_browser(response)
items = SpecItem()
for product in response.xpath("//div[class='inner-body']/div[class='serp-wrap-all']/table[class='table-valign-middle matrix-table']"):
name = product.xpath(".//tr/td[class='matrix-col-part']/a[class='nowrap']/text()").extract()
items['ProductName']=''.join(name).strip()
price = product.xpath("//tr/td['4']/div[class='small']/text()").extract()
items['Price'] = ''.join(price).strip()
yield items
This xpath response.xpath("//div[class='inner-body']/div[class='serp-wrap-all']/table[class='table-valign-middle matrix-table']")
doesn't work.
Any suggestions
Upvotes: 0
Views: 119
Reputation: 632
You are using wrong XPATH syntax!
//div[class='inner-body']/div[class='serp-wrap-all']/table[class='table-valign-middle matrix-table']
The correct format is to add "@" before "class"
//div[@class='inner-body']/div[@class='serp-wrap-all']/..
And there is no 'matrix-table' table in above link.
Try using something like:
//div[@class='inner-body']/div[@class='serp-wrap-all']//*[contains(@class,'matrix-table')]
Upvotes: 1
Reputation: 84465
If you want just the top level product name use css selector of
.serp-card-pdp-link
and extract the text
The median price comes from css selector
.avg-price-faux-btn
You can apply css with scrapy using .css(selector)
Upvotes: 0