Reputation: 121
I am encountering an issue while crawling a website using Scrapy. I am making a GET request to a specific API endpoint, but the request is failing with an SSL error. Below is the code for the request and the subsequent error message.
url = 'https://www.macmap.org/api/v2/ntlc-products?countryCode=764&level=8&code=010229'
headers = {
"Accept": "application/json, text/javascript, */*; q=0.01",
"Accept-Language": "en-US,en;q=0.9,ml;q=0.8",
"Connection": "keep-alive",
"Content-Type": "application/json; charset=utf-8",
"DNT": "1",
"Referer": "https://www.macmap.org/en//query/results?reporter=764&partner=004&product=010229&level=6",
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "same-origin",
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
"X-Requested-With": "XMLHttpRequest",
}
request = Request(
url=url,
method='GET',
dont_filter=True,
headers=headers,
)
fetch(request)
However, I received the following response:
2024-07-17 06:07:03 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.macmap.org/api/v2/ntlc-products?countryCode=764&level=8&code=010229> (failed 1 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', '', 'unexpected eof while reading')]>]
2024-07-17 06:07:03 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.macmap.org/api/v2/ntlc-products?countryCode=764&level=8&code=010229> (failed 2 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', '', 'unexpected eof while reading')]>]
2024-07-17 06:07:03 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET https://www.macmap.org/api/v2/ntlc-products?countryCode=764&level=8&code=010229> (failed 3 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', '', 'unexpected eof while reading')]>]
---------------------------------------------------------------------------
ResponseNeverReceived Traceback (most recent call last)
Cell In[1], line 27
5 headers = {
6 "Accept": "application/json, text/javascript, */*; q=0.01",
7 "Accept-Language": "en-US,en;q=0.9,ml;q=0.8",
(...)
16 "X-Requested-With": "XMLHttpRequest",
17 }
20 request = Request(
21 url=url,
22 method='GET',
23 dont_filter=True,
24 headers=headers,
25 )
---> 27 fetch(request)
File /opt/venv-python3.9-scrapy/lib/python3.9/site-packages/scrapy/shell.py:110, in Shell.fetch(self, request_or_url, spider, redirect, **kwargs)
108 response = None
109 try:
--> 110 response, spider = threads.blockingCallFromThread(
111 reactor, self._schedule, request, spider)
112 except IgnoreRequest:
113 pass
File /opt/venv-python3.9-scrapy/lib/python3.9/site-packages/twisted/internet/threads.py:119, in blockingCallFromThread(reactor, f, *a, **kw)
117 result = queue.get()
118 if isinstance(result, failure.Failure):
--> 119 result.raiseException()
120 return result
File /opt/venv-python3.9-scrapy/lib/python3.9/site-packages/twisted/python/failure.py:475, in Failure.raiseException(self)
474 def raiseException(self):
--> 475 raise self.value.with_traceback(self.tb)
ResponseNeverReceived: [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', '', 'unexpected eof while reading')]>]```
The package versions I am using are:
Upvotes: 0
Views: 90
Reputation: 6564
Here's a simple spider that will pull data from that API endpoint.
import scrapy
import json
from urllib.parse import urlencode
import logging
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:127.0) Gecko/20100101 Firefox/127.0',
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Accept-Language': 'en-US,en;q=0.5',
'Content-Type': 'application/json; charset=utf-8',
'X-Requested-With': 'XMLHttpRequest',
'Connection': 'keep-alive',
'Referer': 'https://www.macmap.org/en//query/results?reporter=826&partner=710&product=010229&level=6',
'Sec-Fetch-Dest': 'empty',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Site': 'same-origin',
}
class ProductSpider(scrapy.Spider):
name = "product"
def start_requests(self):
params = {
'countryCode': '826',
'level': '8',
'code': '010229',
}
base_url = "https://www.macmap.org/api/v2/ntlc-products"
url_with_params = f"{base_url}?{urlencode(params)}"
yield scrapy.Request(url_with_params, self.parse, headers=headers)
def parse(self, response):
records = json.loads(response.text)
for record in records:
yield(record)
I changed the country code to get more results. You can revert to your original country code.
The parser method returns one record at a time. You could, alternatively, return all of them in a batch.
An extract from the logs:
2024-07-18 05:53:13 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.macmap.org/api/v2/ntlc-products?countryCode=826&level=8&code=010229>
{'Code': '0102291050', 'Name': 'Live cattle (excl. pure-bred for breeding): Other: Of a weight not exceeding 80\xa0kg: Bulls of the Schwyz, Fribourg and spotted Simmental breeds, other than for slaughter'}
2024-07-18 05:53:13 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.macmap.org/api/v2/ntlc-products?countryCode=826&level=8&code=010229>
{'Code': '0102291090', 'Name': 'Live cattle (excl. pure-bred for breeding): Other: Of a weight not exceeding 80\xa0kg: Other'}
2024-07-18 05:53:13 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.macmap.org/api/v2/ntlc-products?countryCode=826&level=8&code=010229>
{'Code': '0102292100', 'Name': 'Live cattle (excl. pure-bred for breeding): Other: Of a weight exceeding 80\xa0kg but not exceeding 160\xa0kg: For slaughter'}
2024-07-18 05:53:13 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.macmap.org/api/v2/ntlc-products?countryCode=826&level=8&code=010229>
{'Code': '0102292910', 'Name': 'Live cattle (excl. pure-bred for breeding): Other: Of a weight exceeding 80\xa0kg but not exceeding 160\xa0kg: Other: Young male bovine animals, intended for fattening'}
2024-07-18 05:53:13 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.macmap.org/api/v2/ntlc-products?countryCode=826&level=8&code=010229>
{'Code': '0102291030', 'Name': 'Live cattle (excl. pure-bred for breeding): Other: Of a weight not exceeding 80\xa0kg: Heifers of the Schwyz and Fribourg breeds, other than for slaughter'}
2024-07-18 05:53:13 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.macmap.org/api/v2/ntlc-products?countryCode=826&level=8&code=010229>
{'Code': '0102291040', 'Name': 'Live cattle (excl. pure-bred for breeding): Other: Of a weight not exceeding 80\xa0kg: Heifers of the spotted Simmental breed, other than for slaughter'}
2024-07-18 05:53:13 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.macmap.org/api/v2/ntlc-products?countryCode=826&level=8&code=010229>
{'Code': '0102291020', 'Name': 'Live cattle (excl. pure-bred for breeding): Other: Of a weight not exceeding 80\xa0kg: Heifers of the grey, brown or yellow mountain breeds and spotted Pinzgau breed, other than for slaughter'}
Upvotes: 0