Reputation: 41
I am trying to scrape data using Scrapy. All Parts data are extracted except the Product Image URL. When trying to extract the Image URL It returns a List of Empty Strings as Shown in the below Image
Project Code
menscloths.py (Spider)
import scrapy
from ..items import DataItem
class MensclothsSpider(scrapy.Spider):
name = 'menscloths'
next_page=2
start_urls = ['https://www.example.com/clothing-and-accessories/topwear/pr?sid=clo%2Cash&otracker=categorytree&p%5B%5D=facets.ideal_for%255B%255D%3DMen&page=1']
def parse(self, response):
items=DataItem()
products=response.css("div._1xHGtK")
for product in products:
name = product.css(".IRpwTa::text").extract()
brand = product.css("._2WkVRV::text").extract()
original_price = product.css("._3I9_wc::text").extract()[1]
sale_price = product.css("._30jeq3::text").extract()[0][1:]
image_url = product.css("._2r_T1I::attr('src')").extract()
product_page_url = "https://www.example.com"+product.css("._2UzuFa::attr('href')").extract()[0]
product_category = "men topwear"
items["name"]=name
items["brand"]=brand
items["original_price"]=original_price
items["sale_price"]=sale_price
items["image_url"]=image_url
items["product_page_url"]=product_page_url
items["product_category"]=product_category
yield items
item.py
import scrapy
class DataItem(scrapy.Item):
# define the fields for your item here like:
name = scrapy.Field()
brand = scrapy.Field()
original_price = scrapy.Field()
sale_price = scrapy.Field()
image_url = scrapy.Field()
product_page_url = scrapy.Field()
product_category = scrapy.Field()
setting.py
BOT_NAME = 'scraper'
SPIDER_MODULES = ['scraper.spiders']
NEWSPIDER_MODULE = 'scraper.spiders'
ITEM_PIPELINES = {
'scraper.pipelines.ScraperPipeline': 300,
}
Thank in advance
Upvotes: 0
Views: 144
Reputation: 2517
I've seen this happen multiple times before. If you look closely at the images when you load the page, you can see that the image appears after a bit of time (even though, at least for me, the time it takes to load is about 1 second). However, your code is just loading the page and then trying to get the images, not waiting for the images to load in. You need some sort of wait function in order to wait for the images to load, and then get the images.
Upvotes: 2