Reputation: 9
I am trying to scrape daraz.pk and ran into this error .The spider scrapes all the values on the page until the last value because it returns None value and then the spider throws an NoneType object is not iterable . I have tried using exception handling methods but didn't work anyways im sharing my code here if anyone can help out .I'm using selenium and scrapy together to get the description of items on the items page
**
import scrapy
from selenium.webdriver import Chrome
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from ..items import EcomItem
class DarazSpider(scrapy.Spider):
name = 'daraz'
def start_requests(self):
path = 'C:\Program Files (x86)\chromedriver.exe'
driver = Chrome(executable_path=path)
driver.get('https://www.daraz.pk/')
electronics = driver.find_element(By.NAME, 'q')
electronics.send_keys('Books')
electronics.send_keys(Keys.RETURN)
link_elements = driver.find_elements(By.XPATH,'/html/body/div[3]/div/div[2]/div/div/div/div[2]/div/div/div/div[2]/div[2]/a[text()]')
for link_el in link_elements:
href = link_el.text
print(href)
def parse(self, response):
pass
**
here is the error
**
Traceback (most recent call last):
d = crawler.crawl(*args, **kwargs)
File "C:\Users\Intag\New folder (2)\lib\site-packages\twisted\internet\defer.py", line 1905, in unwindGenerator
return _cancellableInlineCallbacks(gen)
File "C:\Users\Intag\New folder (2)\lib\site-packages\twisted\internet\defer.py", line 1815, in _cancellableInlineCallbacks
_inlineCallbacks(None, gen, status)
--- <exception caught here> ---
File "C:\Users\Intag\New folder (2)\lib\site-packages\twisted\internet\defer.py", line 1660, in _inlineCallbacks
result = current_context.run(gen.send, result)
File "C:\Users\Intag\New folder (2)\lib\site-packages\scrapy\crawler.py", line 103, in crawl
start_requests = iter(self.spider.start_requests())
builtins.TypeError: 'NoneType' object is not iterable
2022-08-06 10:29:20 [twisted] CRITICAL:
Traceback (most recent call last):
File "C:\Users\Intag\New folder (2)\lib\site-packages\twisted\internet\defer.py", line 1660, in _inlineCallbacks
result = current_context.run(gen.send, result)
File "C:\Users\Intag\New folder (2)\lib\site-packages\scrapy\crawler.py", line 103, in crawl
start_requests = iter(self.spider.start_requests())
TypeError: 'NoneType' object is not iterable
**
Upvotes: 0
Views: 246
Reputation: 16187
You can get the desired data from API
. As data is loaded dynamically by JAvaScript via API which is GET
method and data is in json format. It's the super easiest and the robust way to grab data.
Example:
import scrapy
import json
from scrapy.crawler import CrawlerProcess
class TestSpider(scrapy.Spider):
name = 'test'
custom_settings = {
'CONCURRENT_REQUESTS_PER_DOMAIN': 1,
'DOWNLOAD_DELAY': 1
}
def start_requests(self):
headers= {
'content-type': 'application/json',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'
}
api_url='https://www.daraz.pk/catalog/?_keyori=ss&ajax=true&clickTrackInfo=textId--2543448522407782846__abId--296224__pvid--721834c6-aa06-4851-a758-c1dceed517aa__matchType--1__srcQuery--None__spellQuery--books&from=suggest_normal&page=1&q=books&spm=a2a0e.home.search.1.35e34937dlzwzf&sugg=books_0_1'
yield scrapy.Request(
url= api_url,
method='GET',
headers=headers,
callback=self.parse
)
def parse(self, response):
resp = json.loads(response.body)
for item in resp['mods']['listItems']:
yield {
'productUrl':'https:' + item['productUrl']
}
if __name__ == "__main__":
process = CrawlerProcess(TestSpider)
process.crawl()
process.start()
Output:
Crawled (200) <GET https://www.daraz.pk/catalog/?_keyori=ss&ajax=true&clickTrackInfo=textId--2543448522407782846__abId--296224__pvid--721834c6-aa06-4851-a758-c1dceed517aa__matchType--1__srcQuery--None__spellQuery--books&from=suggest_normal&page=1&q=books&spm=a2a0e.home.search.1.35e34937dlzwzf&sugg=books_0_1> (referer: None)
2022-08-06 12:08:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.daraz.pk/catalog/?_keyori=ss&ajax=true&clickTrackInfo=textId--2543448522407782846__abId--296224__pvid--721834c6-aa06-4851-a758-c1dceed517aa__matchType--1__srcQuery--None__spellQuery--books&from=suggest_normal&page=1&q=books&spm=a2a0e.home.search.1.35e34937dlzwzf&sugg=books_0_1>
{'productUrl': 'https://www.daraz.pk/products/5-i144834997-s1306536157.html?search=1'}
2022-08-06 12:08:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.daraz.pk/catalog/?_keyori=ss&ajax=true&clickTrackInfo=textId--2543448522407782846__abId--296224__pvid--721834c6-aa06-4851-a758-c1dceed517aa__matchType--1__srcQuery--None__spellQuery--books&from=suggest_normal&page=1&q=books&spm=a2a0e.home.search.1.35e34937dlzwzf&sugg=books_0_1>
{'productUrl': 'https://www.daraz.pk/products/4-i146864039-s1309826616.html?search=1'}
2022-08-06 12:08:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.daraz.pk/catalog/?_keyori=ss&ajax=true&clickTrackInfo=textId--2543448522407782846__abId--296224__pvid--721834c6-aa06-4851-a758-c1dceed517aa__matchType--1__srcQuery--None__spellQuery--books&from=suggest_normal&page=1&q=books&spm=a2a0e.home.search.1.35e34937dlzwzf&sugg=books_0_1>
{'productUrl': 'https://www.daraz.pk/products/-i229320627-s1449691508.html?search=1'}
2022-08-06 12:08:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.daraz.pk/catalog/?_keyori=ss&ajax=true&clickTrackInfo=textId--2543448522407782846__abId--296224__pvid--721834c6-aa06-4851-a758-c1dceed517aa__matchType--1__srcQuery--None__spellQuery--books&from=suggest_normal&page=1&q=books&spm=a2a0e.home.search.1.35e34937dlzwzf&sugg=books_0_1>
{'productUrl': 'https://www.daraz.pk/products/-i229571902-s1449944276.html?search=1'}
2022-08-06 12:08:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.daraz.pk/catalog/?_keyori=ss&ajax=true&clickTrackInfo=textId--2543448522407782846__abId--296224__pvid--721834c6-aa06-4851-a758-c1dceed517aa__matchType--1__srcQuery--None__spellQuery--books&from=suggest_normal&page=1&q=books&spm=a2a0e.home.search.1.35e34937dlzwzf&sugg=books_0_1>
{'productUrl': 'https://www.daraz.pk/products/-i219883778-s1432847877.html?search=1'}
2022-08-06 12:08:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.daraz.pk/catalog/?_keyori=ss&ajax=true&clickTrackInfo=textId--2543448522407782846__abId--296224__pvid--721834c6-aa06-4851-a758-c1dceed517aa__matchType--1__srcQuery--None__spellQuery--books&from=suggest_normal&page=1&q=books&spm=a2a0e.home.search.1.35e34937dlzwzf&sugg=books_0_1>
{'productUrl': 'https://www.daraz.pk/products/pmc-nmdcat-nums-agha-khan-2022-i209146784-s1415196801.html?search=1'}
2022-08-06 12:08:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.daraz.pk/catalog/?_keyori=ss&ajax=true&clickTrackInfo=textId--2543448522407782846__abId--296224__pvid--721834c6-aa06-4851-a758-c1dceed517aa__matchType--1__srcQuery--None__spellQuery--books&from=suggest_normal&page=1&q=books&spm=a2a0e.home.search.1.35e34937dlzwzf&sugg=books_0_1>
{'productUrl': 'https://www.daraz.pk/products/nmdcat-bookmbbscommbbscompkpmc-mdcat-practice-books-2022entry-test-preparation-booksentry-test-booksentry-test-preparation-books-2022guide-for-solved-past-paper-papers-exam-exams-test-tests-book-n-books-bnb-multan-ghar-kitab-mkg-new-fareed-fbc-i276082277-s1491310765.html?search=1'}
2022-08-06 12:08:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.daraz.pk/catalog/?_keyori=ss&ajax=true&clickTrackInfo=textId--2543448522407782846__abId--296224__pvid--721834c6-aa06-4851-a758-c1dceed517aa__matchType--1__srcQuery--None__spellQuery--books&from=suggest_normal&page=1&q=books&spm=a2a0e.home.search.1.35e34937dlzwzf&sugg=books_0_1>
{'productUrl': 'https://www.daraz.pk/products/tenses-made-easy-by-efzal-anware-mufti-i209992860-s1416720338.html?search=1'}
2022-08-06 12:08:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.daraz.pk/catalog/?_keyori=ss&ajax=true&clickTrackInfo=textId--2543448522407782846__abId--296224__pvid--721834c6-aa06-4851-a758-c1dceed517aa__matchType--1__srcQuery--None__spellQuery--books&from=suggest_normal&page=1&q=books&spm=a2a0e.home.search.1.35e34937dlzwzf&sugg=books_0_1>
{'productUrl': 'https://www.daraz.pk/products/sk-original-golden-13medical-books-in-urdu-i198834812-s1395012400.html?search=1'}
2022-08-06 12:08:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.daraz.pk/catalog/?_keyori=ss&ajax=true&clickTrackInfo=textId--2543448522407782846__abId--296224__pvid--721834c6-aa06-4851-a758-c1dceed517aa__matchType--1__srcQuery--None__spellQuery--books&from=suggest_normal&page=1&q=books&spm=a2a0e.home.search.1.35e34937dlzwzf&sugg=books_0_1>
{'productUrl': 'https://www.daraz.pk/products/-i242170073-s1461239796.html?search=1'}
2022-08-06 12:08:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.daraz.pk/catalog/?_keyori=ss&ajax=true&clickTrackInfo=textId--2543448522407782846__abId--296224__pvid--721834c6-aa06-4851-a758-c1dceed517aa__matchType--1__srcQuery--None__spellQuery--books&from=suggest_normal&page=1&q=books&spm=a2a0e.home.search.1.35e34937dlzwzf&sugg=books_0_1>
{'productUrl': 'https://www.daraz.pk/products/-i270001029-s1483708982.html?search=1'}
2022-08-06 12:08:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.daraz.pk/catalog/?_keyori=ss&ajax=true&clickTrackInfo=textId--2543448522407782846__abId--296224__pvid--721834c6-aa06-4851-a758-c1dceed517aa__matchType--1__srcQuery--None__spellQuery--books&from=suggest_normal&page=1&q=books&spm=a2a0e.home.search.1.35e34937dlzwzf&sugg=books_0_1>
{'productUrl': 'https://www.daraz.pk/products/css-pms-iqra-ud-din-css-o-css-2022-css-2023-i220043944-s1433189818.html?search=1'}
... so on
Upvotes: 3