Reputation: 1396
I have figured out how to scrape all images from a specific URL, but I'm really just trying to scrape a picture of the product on the page. Utilizing this URL for example... https://www.3bscientific.com/us/simshirt-auscultation-system-size-xl-1022828-cardionics-718-3420xl,p_148_32010.html
I'm attempting to scrape the URL of the picture of the shirt from the page and no other images. A hackish way I've been able to do this is:
for image in response.xpath('//img/@src').extract():
# make each one into a full URL and add to item[]
picList.append(response.urljoin(image))
print("PICLIST")
print(picList[12])
I noticed the picture URL is always the 12th in the list, but it feels like there should be a better way to do this giving I only need the url for the 12th picture. I am also unable to hardcode anything because this scraper scrapes multiple products at a time.
Upvotes: 0
Views: 154
Reputation: 4822
You can use a more specific xpath (I included the id).
In [1]: url = 'https://www.3bscientific.com/us/simshirt-auscultation-system-size-xl-1022828-cardionics-718-3420xl,p_148
...: _32010.html'
In [2]: req = scrapy.Request(url=url)
In [3]: fetch(req)
[scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.3bscientific.com/il/simshirt-auscultation-system-size-xl-1022828-cardionics-718-3420xl,p_148_32010.html> (referer: None)
In [4]: response.xpath('//div[@id="ProductImageContainer"]//a/img/@src').get()
Out[4]: '/imagelibrary/1022828/1022828_01_SimShirt-Auscultation-System-size-XL.jpg'
In [5]: response.urljoin(response.xpath('//div[@id="ProductImageContainer"]//a/img/@src').get())
Out[5]: 'https://www.3bscientific.com/imagelibrary/1022828/1022828_01_SimShirt-Auscultation-System-size-XL.jpg'
Upvotes: 2