Vincent
Vincent

Reputation: 137

Google App Engine - ConnectionError: ('Connection aborted.', error(13, 'Permission denied'))

I am dealing with a connection error and need help. I am using Python2.7 and Google App Engine for this project. I am retrying to use nba_py third party API to retrieve additional information to be displayed on my website but I am getting a ConnectionError. The error looks like this:

First connection error:

requests/adapters.py", line 490, in send
    raise ConnectionError(err, request=request)
ConnectionError: ('Connection aborted.', HTTPException('Deadline exceeded 
while waiting for HTTP response from URL: http://stats.nba.com/stats/scoreboard?LeagueID=00&GameDate=03%2F03%2F2018&DayOffset=0',))

I'm not sure if I've solved the issue right but I changed http to https for the BASE_URL in __init__.py file.

So after that, it gave me SSLError:

requests/adapters.py", line 506, in send
        raise SSLError(e, request=request)
    SSLError: HTTPSConnectionPool(host='stats.nba.com', port=443): Max retries exceeded with url: /stats/scoreboard?DayOffset=0&GameDate=03%2F03%2F2018&LeagueID=00 (Caused by SSLError("Can't connect to HTTPS URL because the SSL module is not available.",))

Again, I'm not sure if I've fixed it correctly but I added name: ssl and version:latest to app.yaml file from Using Python SSL

After that, it gave me another connection error which I've been stuck on for a while now.

requests/adapters.py", line 490, in send
    raise ConnectionError(err, request=request)
ConnectionError: ('Connection aborted.', error(13, 'Permission denied'))

Any help or links to where it will help me solve this issue would be greatly appreciated. Thanks!

Upvotes: 3

Views: 1379

Answers (3)

Tianhui Li
Tianhui Li

Reputation: 111

With requests toolbelt, requests now works in both production and development:

from requests_toolbelt.adapters import appengine
appengine.monkeypatch()

For more information, see this article.

Upvotes: 2

A.Queue
A.Queue

Reputation: 1572

It seems that NBA wants this endpoint (http://stats.nba.com/stats') to be only used for common browsing and not to be accessed programmatically. Especially considering that they do not have it publicly documented. I personally would advise you to contact them directly before accessing this endpoint, especially if you want to do that from Google App Engine.

I came to this conclusion while doing some http requests with curl to this endpoint. For example, while using the url from your first example

http://stats.nba.com/stats/scoreboard?LeagueID=00&GameDate=03%2F03%2F2018&DayOffset=0

I noticed that:

  1. curl request works when sent from localhost but needs some headers. If those headers are present it returns data in json format, same as when a request comes from a browser.
  2. The same request done by curl doesn’t work from GCP Shell or Compute VM instance, it keeps waiting for an answer indefinitely.
  3. A request also 'fails' if it has user-agent from App Engine Standard set. Both deployed applications and dev_appserver.py are setting this user-agent automatically. That's why you get

ConnectionError: ('Connection aborted.', HTTPException('Deadline exceeded while waiting for HTTP response from URL: ...

To know for sure you have to ask NBA but my guess is that NBA wants traffic to go through their webpage and is protecting itself from scrapping by blocking some IP ranges and some user-agents that will not be used by most of their visitors. Contacting them directly on that issue is the way to go.

Upvotes: 0

Dan Cornilescu
Dan Cornilescu

Reputation: 39824

Since you're using a 3rd party API you can't really improve yourself its performance and availability. What you may be able to do is reducing the impact of such failures onto your own clients, by not placing the calls to the external API directly in the critical path of building responses to requests from your clients.

If the info that you obtain from the 3rd party API and pass onto your clients is not live you could use a cache setup:

  • a background (periodic and/or on-demand) job makes requests to the 3rd party API to populate/refresh the cache
  • you're always replying to your clients with info from the cache, not directly from the 3rd party API - with a performance and availability under your control

Another approach (which could be used even if your 3rd party lib data is live/ not cacheable and thus you need to obtain it at every request from your clients):

  • reply to your client request with the portion of the response that you can provide immediately and a script instructing the client to followup with AJAX requests for the 3rd party API data still pending
  • launch background requests for the 3rd party API data, retried if needed
  • whenever you get the 3rd party API data you reply with it to the AJAX requests
  • on your client side the script assembles the data received via AJAX into the page and displays it

You can even mix the 2 approaches for a solution based on an on-demand refreshed cache, where the AJAX responses are:

  • immediately built from the cache if the data is "fresh" enough, or
  • delayed until fresh data is received from the 3rd party via requests triggered by that very client request

Of course, your code needs to be prepared for and deal with every such failure in interacting with the 3rd party API.

Upvotes: 1

Related Questions