Jff
Jff

Reputation: 121

scrapy - ResponseNeverReceived ('SSL routines', '', 'unexpected eof while reading')

I am encountering an issue while crawling a website using Scrapy. I am making a GET request to a specific API endpoint, but the request is failing with an SSL error. Below is the code for the request and the subsequent error message.


    url = 'https://www.macmap.org/api/v2/ntlc-products?countryCode=764&level=8&code=010229'

    headers = {
        "Accept": "application/json, text/javascript, */*; q=0.01",
        "Accept-Language": "en-US,en;q=0.9,ml;q=0.8",
        "Connection": "keep-alive",
        "Content-Type": "application/json; charset=utf-8",
        "DNT": "1",
        "Referer": "https://www.macmap.org/en//query/results?reporter=764&partner=004&product=010229&level=6",
        "Sec-Fetch-Dest": "empty",
        "Sec-Fetch-Mode": "cors",
        "Sec-Fetch-Site": "same-origin",
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
        "X-Requested-With": "XMLHttpRequest",
    }


    request = Request(
        url=url,
        method='GET',
        dont_filter=True,
        headers=headers,
    )

    fetch(request)

However, I received the following response:

2024-07-17 06:07:03 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.macmap.org/api/v2/ntlc-products?countryCode=764&level=8&code=010229> (failed 1 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', '', 'unexpected eof while reading')]>]
2024-07-17 06:07:03 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.macmap.org/api/v2/ntlc-products?countryCode=764&level=8&code=010229> (failed 2 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', '', 'unexpected eof while reading')]>]
2024-07-17 06:07:03 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET https://www.macmap.org/api/v2/ntlc-products?countryCode=764&level=8&code=010229> (failed 3 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', '', 'unexpected eof while reading')]>]
---------------------------------------------------------------------------
ResponseNeverReceived                     Traceback (most recent call last)
Cell In[1], line 27
      5 headers = {
      6     "Accept": "application/json, text/javascript, */*; q=0.01",
      7     "Accept-Language": "en-US,en;q=0.9,ml;q=0.8",
   (...)
     16     "X-Requested-With": "XMLHttpRequest",
     17 }
     20 request = Request(
     21     url=url,
     22     method='GET',
     23     dont_filter=True,
     24     headers=headers,
     25 )
---> 27 fetch(request)

File /opt/venv-python3.9-scrapy/lib/python3.9/site-packages/scrapy/shell.py:110, in Shell.fetch(self, request_or_url, spider, redirect, **kwargs)
    108 response = None
    109 try:
--> 110     response, spider = threads.blockingCallFromThread(
    111         reactor, self._schedule, request, spider)
    112 except IgnoreRequest:
    113     pass

File /opt/venv-python3.9-scrapy/lib/python3.9/site-packages/twisted/internet/threads.py:119, in blockingCallFromThread(reactor, f, *a, **kw)
    117 result = queue.get()
    118 if isinstance(result, failure.Failure):
--> 119     result.raiseException()
    120 return result

File /opt/venv-python3.9-scrapy/lib/python3.9/site-packages/twisted/python/failure.py:475, in Failure.raiseException(self)
    474 def raiseException(self):
--> 475     raise self.value.with_traceback(self.tb)

ResponseNeverReceived: [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', '', 'unexpected eof while reading')]>]```

The package versions I am using are:

Upvotes: 0

Views: 90

Answers (1)

datawookie
datawookie

Reputation: 6564

Here's a simple spider that will pull data from that API endpoint.

import scrapy
import json
from urllib.parse import urlencode
import logging

headers = {
    'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:127.0) Gecko/20100101 Firefox/127.0',
    'Accept': 'application/json, text/javascript, */*; q=0.01',
    'Accept-Language': 'en-US,en;q=0.5',
    'Content-Type': 'application/json; charset=utf-8',
    'X-Requested-With': 'XMLHttpRequest',
    'Connection': 'keep-alive',
    'Referer': 'https://www.macmap.org/en//query/results?reporter=826&partner=710&product=010229&level=6',
    'Sec-Fetch-Dest': 'empty',
    'Sec-Fetch-Mode': 'cors',
    'Sec-Fetch-Site': 'same-origin',
}

class ProductSpider(scrapy.Spider):
    name = "product"

    def start_requests(self):
        params = {
            'countryCode': '826',
            'level': '8',
            'code': '010229',
        }

        base_url = "https://www.macmap.org/api/v2/ntlc-products"
        url_with_params = f"{base_url}?{urlencode(params)}"

        yield scrapy.Request(url_with_params, self.parse, headers=headers)
            
    def parse(self, response):
        records = json.loads(response.text)

        for record in records:
            yield(record)

I changed the country code to get more results. You can revert to your original country code.

The parser method returns one record at a time. You could, alternatively, return all of them in a batch.

An extract from the logs:

2024-07-18 05:53:13 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.macmap.org/api/v2/ntlc-products?countryCode=826&level=8&code=010229>                                                                                     
{'Code': '0102291050', 'Name': 'Live cattle (excl. pure-bred for breeding): Other: Of a weight not exceeding 80\xa0kg: Bulls of the Schwyz, Fribourg and spotted Simmental breeds, other than for slaughter'}                           
2024-07-18 05:53:13 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.macmap.org/api/v2/ntlc-products?countryCode=826&level=8&code=010229>                                                                                     
{'Code': '0102291090', 'Name': 'Live cattle (excl. pure-bred for breeding): Other: Of a weight not exceeding 80\xa0kg: Other'}                                                                                                          
2024-07-18 05:53:13 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.macmap.org/api/v2/ntlc-products?countryCode=826&level=8&code=010229>                                                                                     
{'Code': '0102292100', 'Name': 'Live cattle (excl. pure-bred for breeding): Other: Of a weight exceeding 80\xa0kg but not exceeding 160\xa0kg: For slaughter'}                                                                          
2024-07-18 05:53:13 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.macmap.org/api/v2/ntlc-products?countryCode=826&level=8&code=010229>                                                                                     
{'Code': '0102292910', 'Name': 'Live cattle (excl. pure-bred for breeding): Other: Of a weight exceeding 80\xa0kg but not exceeding 160\xa0kg: Other: Young male bovine animals, intended for fattening'}                               
2024-07-18 05:53:13 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.macmap.org/api/v2/ntlc-products?countryCode=826&level=8&code=010229>                                                                                     
{'Code': '0102291030', 'Name': 'Live cattle (excl. pure-bred for breeding): Other: Of a weight not exceeding 80\xa0kg: Heifers of the Schwyz and Fribourg breeds, other than for slaughter'}                                            
2024-07-18 05:53:13 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.macmap.org/api/v2/ntlc-products?countryCode=826&level=8&code=010229>                                                                                     
{'Code': '0102291040', 'Name': 'Live cattle (excl. pure-bred for breeding): Other: Of a weight not exceeding 80\xa0kg: Heifers of the spotted Simmental breed, other than for slaughter'}                                               
2024-07-18 05:53:13 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.macmap.org/api/v2/ntlc-products?countryCode=826&level=8&code=010229>                                                                                     
{'Code': '0102291020', 'Name': 'Live cattle (excl. pure-bred for breeding): Other: Of a weight not exceeding 80\xa0kg: Heifers of the grey, brown or yellow mountain breeds and spotted Pinzgau breed, other than for slaughter'}

Upvotes: 0

Related Questions