Reputation: 779
from scrapy import Spider
from scrapy.http import Request
class AuthorSpider(Spider):
name = 'book'
start_urls = ['https://www.amazon.sg/s?k=Measuring+Tools+%26+Scales&i=home&crid=1011S67HHJSEW&sprefix=measuring+tools+%26+scales%2Chome%2C408&ref=nb_sb_noss']
def parse(self, response):
books = response.xpath("//h2/a/@href").extract()
for book in books:
url = response.urljoin(book)
yield Request(url, callback=self.parse_book)
def parse_book(self, response):
rows = response.xpath('//table[@id="productDetails_techSpec_section_1"]//tr')
table={}
for row in rows:
brand = row.xpath("//th[@class='a-color-secondary a-size-base prodDetSectionEntry' and contains(text(), 'Brand')]/following-sibling::td/text()").get()
asin = row.xpath("//th[@class='a-color-secondary a-size-base prodDetSectionEntry' and contains(text(), 'ASIN')]/following-sibling::td/text()").get().replace('\u200e',"")
table.update({'Brand':brand,'Asin':asin})
yield table
I want to scrape only brand
and ASIN
from the table I scape the text from the product information
these is the link https://www.amazon.sg/Etekcity-Accurate-Measuring-Packages-Stainless/dp/B08BPB9T1N/ref=sr_1_1?crid=1011S67HHJSEW&keywords=Measuring%2BTools%2B%26%2BScales&qid=1643125635&s=home&sprefix=measuring%2Btools%2B%26%2Bscales%2Chome%2C408&sr=1-1&th=1
Upvotes: 0
Views: 99
Reputation: 1128
If you just need brand and ASIN you don't need to iterate through the whole table. You can use xpath to directly select those attributes. One way to do it is using following.
brand = response.xpath("//th[@class='a-color-secondary a-size-base prodDetSectionEntry' and contains(text(), 'Brand')]/following-sibling::td/text()").get()
asin = response.xpath("//th[@class='a-color-secondary a-size-base prodDetSectionEntry' and contains(text(), 'ASIN')]/following-sibling::td/text()").get()
You might need to clean up the resulting text a bit using str().strip(). All this xpath is saying is "find the th tag with the right class and with a text of 'Brand' or 'ASIN' then look ahead to the next TD tag and grab that text."
Upvotes: 1