Zain Asif
Zain Asif

Reputation: 9

Error in scraping an ecommerce website daraz.pk

I am trying to scrape daraz.pk and ran into this error .The spider scrapes all the values on the page until the last value because it returns None value and then the spider throws an NoneType object is not iterable . I have tried using exception handling methods but didn't work anyways im sharing my code here if anyone can help out .I'm using selenium and scrapy together to get the description of items on the items page

**

import scrapy
from selenium.webdriver import Chrome
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from ..items import EcomItem
class DarazSpider(scrapy.Spider):
    name = 'daraz'
    def start_requests(self):
        path = 'C:\Program Files (x86)\chromedriver.exe'
        driver = Chrome(executable_path=path)
        driver.get('https://www.daraz.pk/')
        electronics = driver.find_element(By.NAME, 'q')
        electronics.send_keys('Books')
        electronics.send_keys(Keys.RETURN)
        link_elements = driver.find_elements(By.XPATH,'/html/body/div[3]/div/div[2]/div/div/div/div[2]/div/div/div/div[2]/div[2]/a[text()]')
        for link_el in link_elements:
                    href = link_el.text
                    print(href)
    def parse(self, response):
        pass

**

here is the error

**

Traceback (most recent call last):
    d = crawler.crawl(*args, **kwargs)
  File "C:\Users\Intag\New folder (2)\lib\site-packages\twisted\internet\defer.py", line 1905, in unwindGenerator
    return _cancellableInlineCallbacks(gen)
  File "C:\Users\Intag\New folder (2)\lib\site-packages\twisted\internet\defer.py", line 1815, in _cancellableInlineCallbacks
    _inlineCallbacks(None, gen, status)
--- <exception caught here> ---
  File "C:\Users\Intag\New folder (2)\lib\site-packages\twisted\internet\defer.py", line 1660, in _inlineCallbacks
    result = current_context.run(gen.send, result)
  File "C:\Users\Intag\New folder (2)\lib\site-packages\scrapy\crawler.py", line 103, in crawl
    start_requests = iter(self.spider.start_requests())
builtins.TypeError: 'NoneType' object is not iterable
2022-08-06 10:29:20 [twisted] CRITICAL:
Traceback (most recent call last):
  File "C:\Users\Intag\New folder (2)\lib\site-packages\twisted\internet\defer.py", line 1660, in _inlineCallbacks
    result = current_context.run(gen.send, result)
  File "C:\Users\Intag\New folder (2)\lib\site-packages\scrapy\crawler.py", line 103, in crawl
    start_requests = iter(self.spider.start_requests())
TypeError: 'NoneType' object is not iterable

**

Upvotes: 0

Views: 246

Answers (1)

Md. Fazlul Hoque
Md. Fazlul Hoque

Reputation: 16187

You can get the desired data from API. As data is loaded dynamically by JAvaScript via API which is GET method and data is in json format. It's the super easiest and the robust way to grab data.

Example:

import scrapy
import json
from scrapy.crawler import CrawlerProcess
class TestSpider(scrapy.Spider):
    name = 'test'

    custom_settings = {
        'CONCURRENT_REQUESTS_PER_DOMAIN': 1,
        'DOWNLOAD_DELAY': 1
        }

    def start_requests(self):
        headers= {
            'content-type': 'application/json',
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'
        }
        api_url='https://www.daraz.pk/catalog/?_keyori=ss&ajax=true&clickTrackInfo=textId--2543448522407782846__abId--296224__pvid--721834c6-aa06-4851-a758-c1dceed517aa__matchType--1__srcQuery--None__spellQuery--books&from=suggest_normal&page=1&q=books&spm=a2a0e.home.search.1.35e34937dlzwzf&sugg=books_0_1'
        yield scrapy.Request(
            url= api_url,
            method='GET',
            headers=headers,
            callback=self.parse
            )
       
    def parse(self, response):
    
        resp = json.loads(response.body)
        for item in resp['mods']['listItems']:
            yield {
                'productUrl':'https:' + item['productUrl']
            } 
       
if __name__ == "__main__":
    process = CrawlerProcess(TestSpider)
    process.crawl()
    process.start()

Output:

Crawled (200) <GET https://www.daraz.pk/catalog/?_keyori=ss&ajax=true&clickTrackInfo=textId--2543448522407782846__abId--296224__pvid--721834c6-aa06-4851-a758-c1dceed517aa__matchType--1__srcQuery--None__spellQuery--books&from=suggest_normal&page=1&q=books&spm=a2a0e.home.search.1.35e34937dlzwzf&sugg=books_0_1> (referer: None)   
2022-08-06 12:08:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.daraz.pk/catalog/?_keyori=ss&ajax=true&clickTrackInfo=textId--2543448522407782846__abId--296224__pvid--721834c6-aa06-4851-a758-c1dceed517aa__matchType--1__srcQuery--None__spellQuery--books&from=suggest_normal&page=1&q=books&spm=a2a0e.home.search.1.35e34937dlzwzf&sugg=books_0_1>
{'productUrl': 'https://www.daraz.pk/products/5-i144834997-s1306536157.html?search=1'}        
2022-08-06 12:08:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.daraz.pk/catalog/?_keyori=ss&ajax=true&clickTrackInfo=textId--2543448522407782846__abId--296224__pvid--721834c6-aa06-4851-a758-c1dceed517aa__matchType--1__srcQuery--None__spellQuery--books&from=suggest_normal&page=1&q=books&spm=a2a0e.home.search.1.35e34937dlzwzf&sugg=books_0_1>
{'productUrl': 'https://www.daraz.pk/products/4-i146864039-s1309826616.html?search=1'}        
2022-08-06 12:08:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.daraz.pk/catalog/?_keyori=ss&ajax=true&clickTrackInfo=textId--2543448522407782846__abId--296224__pvid--721834c6-aa06-4851-a758-c1dceed517aa__matchType--1__srcQuery--None__spellQuery--books&from=suggest_normal&page=1&q=books&spm=a2a0e.home.search.1.35e34937dlzwzf&sugg=books_0_1>
{'productUrl': 'https://www.daraz.pk/products/-i229320627-s1449691508.html?search=1'}
2022-08-06 12:08:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.daraz.pk/catalog/?_keyori=ss&ajax=true&clickTrackInfo=textId--2543448522407782846__abId--296224__pvid--721834c6-aa06-4851-a758-c1dceed517aa__matchType--1__srcQuery--None__spellQuery--books&from=suggest_normal&page=1&q=books&spm=a2a0e.home.search.1.35e34937dlzwzf&sugg=books_0_1>
{'productUrl': 'https://www.daraz.pk/products/-i229571902-s1449944276.html?search=1'}
2022-08-06 12:08:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.daraz.pk/catalog/?_keyori=ss&ajax=true&clickTrackInfo=textId--2543448522407782846__abId--296224__pvid--721834c6-aa06-4851-a758-c1dceed517aa__matchType--1__srcQuery--None__spellQuery--books&from=suggest_normal&page=1&q=books&spm=a2a0e.home.search.1.35e34937dlzwzf&sugg=books_0_1>
{'productUrl': 'https://www.daraz.pk/products/-i219883778-s1432847877.html?search=1'}
2022-08-06 12:08:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.daraz.pk/catalog/?_keyori=ss&ajax=true&clickTrackInfo=textId--2543448522407782846__abId--296224__pvid--721834c6-aa06-4851-a758-c1dceed517aa__matchType--1__srcQuery--None__spellQuery--books&from=suggest_normal&page=1&q=books&spm=a2a0e.home.search.1.35e34937dlzwzf&sugg=books_0_1>
{'productUrl': 'https://www.daraz.pk/products/pmc-nmdcat-nums-agha-khan-2022-i209146784-s1415196801.html?search=1'}
2022-08-06 12:08:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.daraz.pk/catalog/?_keyori=ss&ajax=true&clickTrackInfo=textId--2543448522407782846__abId--296224__pvid--721834c6-aa06-4851-a758-c1dceed517aa__matchType--1__srcQuery--None__spellQuery--books&from=suggest_normal&page=1&q=books&spm=a2a0e.home.search.1.35e34937dlzwzf&sugg=books_0_1>
{'productUrl': 'https://www.daraz.pk/products/nmdcat-bookmbbscommbbscompkpmc-mdcat-practice-books-2022entry-test-preparation-booksentry-test-booksentry-test-preparation-books-2022guide-for-solved-past-paper-papers-exam-exams-test-tests-book-n-books-bnb-multan-ghar-kitab-mkg-new-fareed-fbc-i276082277-s1491310765.html?search=1'}
2022-08-06 12:08:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.daraz.pk/catalog/?_keyori=ss&ajax=true&clickTrackInfo=textId--2543448522407782846__abId--296224__pvid--721834c6-aa06-4851-a758-c1dceed517aa__matchType--1__srcQuery--None__spellQuery--books&from=suggest_normal&page=1&q=books&spm=a2a0e.home.search.1.35e34937dlzwzf&sugg=books_0_1>
{'productUrl': 'https://www.daraz.pk/products/tenses-made-easy-by-efzal-anware-mufti-i209992860-s1416720338.html?search=1'}
2022-08-06 12:08:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.daraz.pk/catalog/?_keyori=ss&ajax=true&clickTrackInfo=textId--2543448522407782846__abId--296224__pvid--721834c6-aa06-4851-a758-c1dceed517aa__matchType--1__srcQuery--None__spellQuery--books&from=suggest_normal&page=1&q=books&spm=a2a0e.home.search.1.35e34937dlzwzf&sugg=books_0_1>
{'productUrl': 'https://www.daraz.pk/products/sk-original-golden-13medical-books-in-urdu-i198834812-s1395012400.html?search=1'}
2022-08-06 12:08:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.daraz.pk/catalog/?_keyori=ss&ajax=true&clickTrackInfo=textId--2543448522407782846__abId--296224__pvid--721834c6-aa06-4851-a758-c1dceed517aa__matchType--1__srcQuery--None__spellQuery--books&from=suggest_normal&page=1&q=books&spm=a2a0e.home.search.1.35e34937dlzwzf&sugg=books_0_1>
{'productUrl': 'https://www.daraz.pk/products/-i242170073-s1461239796.html?search=1'}
2022-08-06 12:08:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.daraz.pk/catalog/?_keyori=ss&ajax=true&clickTrackInfo=textId--2543448522407782846__abId--296224__pvid--721834c6-aa06-4851-a758-c1dceed517aa__matchType--1__srcQuery--None__spellQuery--books&from=suggest_normal&page=1&q=books&spm=a2a0e.home.search.1.35e34937dlzwzf&sugg=books_0_1>
{'productUrl': 'https://www.daraz.pk/products/-i270001029-s1483708982.html?search=1'}
2022-08-06 12:08:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.daraz.pk/catalog/?_keyori=ss&ajax=true&clickTrackInfo=textId--2543448522407782846__abId--296224__pvid--721834c6-aa06-4851-a758-c1dceed517aa__matchType--1__srcQuery--None__spellQuery--books&from=suggest_normal&page=1&q=books&spm=a2a0e.home.search.1.35e34937dlzwzf&sugg=books_0_1>
{'productUrl': 'https://www.daraz.pk/products/css-pms-iqra-ud-din-css-o-css-2022-css-2023-i220043944-s1433189818.html?search=1'}

... so on

Upvotes: 3

Related Questions