coolpy
coolpy

Reputation: 163

Python-Twisted: Reverse Proxy to HTTPS API: Could not connect

I am trying to build a reverse-proxy to talk to certain APIs(like Twitter, Github, Instagram) that I can then call with my reverse-proxy to any (client) applications I want (think of it like an API-manager).

Also, I am using an LXC-container to do this.

For example, here is the simplest of code that I hacked from the examples on the Twisted Docs:

from twisted.internet import reactor
from twisted.web import proxy, server
from twisted.python.log import startLogging
from sys import stdout
startLogging(stdout)

site = server.Site(proxy.ReverseProxyResource('https://api.github.com/users/defunkt', 443, b''))
reactor.listenTCP(8080, site)
reactor.run()

When I do CURL within the container, I get a valid request (meaning I get the appropriate JSON response).

Here is how I used the CURL command:

curl https://api.github.com/users/defunkt

And here is the output I get:

{
  "login": "defunkt",
  "id": 2,
  "avatar_url": "https://avatars.githubusercontent.com/u/2?v=3",
  "gravatar_id": "",
  "url": "https://api.github.com/users/defunkt",
  "html_url": "https://github.com/defunkt",
  "followers_url": "https://api.github.com/users/defunkt/followers",
  "following_url": "https://api.github.com/users/defunkt/following{/other_user}",
  "gists_url": "https://api.github.com/users/defunkt/gists{/gist_id}",
  "starred_url": "https://api.github.com/users/defunkt/starred{/owner}{/repo}",
  "subscriptions_url": "https://api.github.com/users/defunkt/subscriptions",
  "organizations_url": "https://api.github.com/users/defunkt/orgs",
  "repos_url": "https://api.github.com/users/defunkt/repos",
  "events_url": "https://api.github.com/users/defunkt/events{/privacy}",
  "received_events_url": "https://api.github.com/users/defunkt/received_events",
  "type": "User",
  "site_admin": true,
  "name": "Chris Wanstrath",
  "company": "GitHub",
  "blog": "http://chriswanstrath.com/",
  "location": "San Francisco",
  "email": "[email protected]",
  "hireable": true,
  "bio": null,
  "public_repos": 107,
  "public_gists": 280,
  "followers": 15153,
  "following": 208,
  "created_at": "2007-10-20T05:24:19Z",
  "updated_at": "2016-02-26T22:34:27Z"
}

However, when I attempt fetching the proxy via Firefox using:

http://10.5.5.225:8080/

I get: "Could not connect"

This is what my Twisted log looks like:

2016-02-27 [-] Log opened.

2016-02-27 [-] Site starting on 8080

2016-02-27 [-] Starting factory

2016-02-27 [-] Starting factory

2016-02-27 [-] "10.5.5.225" - - [27/Feb/2016: +0000] "GET / HTTP/1.1" 501 26 "-" "Mozilla/5.0 (X11; Debian; Linux x86_64; rv:44.0) Gecko/20100101 Firefox/44.0"

2016-02-27 [-] Stopping factory

How can I use Twisted to make an API call (most APIs are HTTPS nowadays anyway) and get the required response (basically, what the "200" response/JSON should be)?

I tried looking at this question: Convert HTTP Proxy to HTTPS Proxy in Twisted

But it didn't make much sense from a coding point-of-view (or mention anything about reverse-proxying).

**Edit: I also tried switching out the HTTPS API call for a regular HTTP call using:

curl http[colon][slash][slash]openlibrary[dot]org[slash]authors[slash]OL1A.json

(URL above has been formatted to avoid link-conflict issue)

However, I still get the same error in my browser (as mentioned above).

**Edit2: I have tried running your code, but I get this error:

Error-screenshot

If you look at the image, you will see the error (when running the code) of:

builtins.AttributeError: 'str' object has no attribute 'decode'

Upvotes: 3

Views: 2787

Answers (1)

Glyph
Glyph

Reputation: 31880

If you read the API documentation for ReverseProxyResource, you will see that the signature of __init__ is:

def __init__(self, host, port, path, reactor=reactor):

and "host" is documented as "the host of the web server to proxy".

So you are passing a URI where Twisted expects a host.

Worse yet, ReverseProxyResource is designed for local use on a web server, and doesn't quite support https:// URLs out of the box.

It does have a (very limited) extensibility hook though - proxyClientFactoryClass - and to apologize for ReverseProxyResource not having what you need out of the box, I will show you how to use that to extend ReverseProxyResource to add https:// support so you can use the GitHub API :).

from twisted.web import proxy, server
from twisted.logger import globalLogBeginner, textFileLogObserver
from twisted.protocols.tls import TLSMemoryBIOFactory
from twisted.internet import ssl, defer, task, endpoints
from sys import stdout
globalLogBeginner.beginLoggingTo([textFileLogObserver(stdout)])

class HTTPSReverseProxyResource(proxy.ReverseProxyResource, object):
    def proxyClientFactoryClass(self, *args, **kwargs):
        """
        Make all connections using HTTPS.
        """
        return TLSMemoryBIOFactory(
            ssl.optionsForClientTLS(self.host.decode("ascii")), True,
            super(HTTPSReverseProxyResource, self)
            .proxyClientFactoryClass(*args, **kwargs))
    def getChild(self, path, request):
        """
        Ensure that implementation of C{proxyClientFactoryClass} is honored
        down the resource chain.
        """
        child = super(HTTPSReverseProxyResource, self).getChild(path, request)
        return HTTPSReverseProxyResource(child.host, child.port, child.path,
                                         child.reactor)

@task.react
def main(reactor):
    import sys
    forever = defer.Deferred()
    myProxy = HTTPSReverseProxyResource('api.github.com', 443,
                                        b'/users/defunkt')
    myProxy.putChild("", myProxy)
    site = server.Site(myProxy)
    endpoint = endpoints.serverFromString(
        reactor,
        dict(enumerate(sys.argv)).get(1, "tcp:8080:interface=127.0.0.1")
    )
    endpoint.listen(site)
    return forever

If you run this, curl http://localhost:8080/ should do what you expect.

I've taken the liberty of modernizing your Twisted code somewhat; endpoints instead of listenTCP, logger instead of twisted.python.log, and react instead of starting the reactor yourself.

The weird little putChild piece at the end there is because when we pass b"/users/defunkt" as the path, that means a request for / will result in the client requesting /users/defunkt/ (note the trailing slash), which is a 404 in GitHub's API. If we explicitly proxy the empty-child-segment path as if it did not have the trailing segment, I believe it will do what you expect.

PLEASE NOTE: proxying from plain-text HTTP to encrypted HTTPS can be extremely dangerous, so I've added a default listening interface here of localhost-only. If your bytes transit over an actual network, you should ensure that they are properly encrypted with TLS.

Upvotes: 7

Related Questions