Bob
Bob

Reputation: 1396

Scrapy: How to Scrape a Single Image URL

I have figured out how to scrape all images from a specific URL, but I'm really just trying to scrape a picture of the product on the page. Utilizing this URL for example... https://www.3bscientific.com/us/simshirt-auscultation-system-size-xl-1022828-cardionics-718-3420xl,p_148_32010.html

I'm attempting to scrape the URL of the picture of the shirt from the page and no other images. A hackish way I've been able to do this is:

    for image in response.xpath('//img/@src').extract():
        # make each one into a full URL and add to item[]
        picList.append(response.urljoin(image))
    print("PICLIST")
    print(picList[12])

I noticed the picture URL is always the 12th in the list, but it feels like there should be a better way to do this giving I only need the url for the 12th picture. I am also unable to hardcode anything because this scraper scrapes multiple products at a time.

Upvotes: 0

Views: 154

Answers (1)

SuperUser
SuperUser

Reputation: 4822

You can use a more specific xpath (I included the id).

In [1]: url = 'https://www.3bscientific.com/us/simshirt-auscultation-system-size-xl-1022828-cardionics-718-3420xl,p_148
   ...: _32010.html'

In [2]: req = scrapy.Request(url=url)

In [3]: fetch(req)
[scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.3bscientific.com/il/simshirt-auscultation-system-size-xl-1022828-cardionics-718-3420xl,p_148_32010.html> (referer: None)

In [4]: response.xpath('//div[@id="ProductImageContainer"]//a/img/@src').get()
Out[4]: '/imagelibrary/1022828/1022828_01_SimShirt-Auscultation-System-size-XL.jpg'

In [5]: response.urljoin(response.xpath('//div[@id="ProductImageContainer"]//a/img/@src').get())
Out[5]: 'https://www.3bscientific.com/imagelibrary/1022828/1022828_01_SimShirt-Auscultation-System-size-XL.jpg'

Upvotes: 2

Related Questions