Manoj Kumar
Manoj Kumar

Reputation: 41

Unable to scrape Image URL(Scrapy)

I am trying to scrape data using Scrapy. All Parts data are extracted except the Product Image URL. When trying to extract the Image URL It returns a List of Empty Strings as Shown in the below Image

enter image description here

Project Code

menscloths.py (Spider)

import scrapy
from ..items import DataItem

class MensclothsSpider(scrapy.Spider):
    name = 'menscloths'
    next_page=2
    start_urls = ['https://www.example.com/clothing-and-accessories/topwear/pr?sid=clo%2Cash&otracker=categorytree&p%5B%5D=facets.ideal_for%255B%255D%3DMen&page=1']

    def parse(self, response):
        items=DataItem()
        products=response.css("div._1xHGtK")
        for product in products:
            name = product.css(".IRpwTa::text").extract()
            brand = product.css("._2WkVRV::text").extract()
            original_price = product.css("._3I9_wc::text").extract()[1]
            sale_price = product.css("._30jeq3::text").extract()[0][1:]
            image_url = product.css("._2r_T1I::attr('src')").extract()
            product_page_url = "https://www.example.com"+product.css("._2UzuFa::attr('href')").extract()[0]
            product_category = "men topwear"

            items["name"]=name
            items["brand"]=brand
            items["original_price"]=original_price
            items["sale_price"]=sale_price
            items["image_url"]=image_url
            items["product_page_url"]=product_page_url
            items["product_category"]=product_category
            yield items

            

item.py

import scrapy


class DataItem(scrapy.Item):
    # define the fields for your item here like:
    name = scrapy.Field()
    brand = scrapy.Field()
    original_price = scrapy.Field()
    sale_price = scrapy.Field()
    image_url = scrapy.Field()
    product_page_url = scrapy.Field()
    product_category = scrapy.Field()

setting.py

BOT_NAME = 'scraper'

SPIDER_MODULES = ['scraper.spiders']
NEWSPIDER_MODULE = 'scraper.spiders'


ITEM_PIPELINES = {
   'scraper.pipelines.ScraperPipeline': 300,
}

Thank in advance

Upvotes: 0

Views: 144

Answers (1)

Ayush Garg
Ayush Garg

Reputation: 2517

I've seen this happen multiple times before. If you look closely at the images when you load the page, you can see that the image appears after a bit of time (even though, at least for me, the time it takes to load is about 1 second). However, your code is just loading the page and then trying to get the images, not waiting for the images to load in. You need some sort of wait function in order to wait for the images to load, and then get the images.

Upvotes: 2

Related Questions