w33t
w33t

Reputation: 245

Twisted HTTPS Client

I am currently having some trouble accessing content hosted via https using the twisted python library. I am new to this library, and am assuming there is some concept I am missing that's causing the issue, but perhaps not based upon the example.

Here is a link to the page in which I gathered the example: https://twistedmatrix.com/documents/current/web/howto/client.html

Under the heading HTTP over SSL

from twisted.python.log import err
from twisted.web.client import Agent
from twisted.internet import reactor
from twisted.internet.ssl import optionsForClientTLS

def display(response):
    print("Received response")
    print(response)

def main():
    contextFactory = optionsForClientTLS(u"https://example.com/")
    agent = Agent(reactor, contextFactory)
    d = agent.request("GET", "https://example.com/")
    d.addCallbacks(display, err)
    d.addCallback(lambda ignored: reactor.stop())
    reactor.run()

if __name__ == "__main__":
    main()

When running this code, it straight up fails. I get an error that looks like this:

Traceback (most recent call last):
  File "https.py", line 19, in <module>
    main()
  File "https.py", line 11, in main
    contextFactory = optionsForClientTLS(u"https://example.com/")
  File "/home/amaricich/.local/lib/python2.7/site-packages/twisted/internet/_sslverify.py", line 1336, in optionsForClientTLS
    return ClientTLSOptions(hostname, certificateOptions.getContext())
  File "/home/amaricich/.local/lib/python2.7/site-packages/twisted/internet/_sslverify.py", line 1198, in __init__
    self._hostnameBytes = _idnaBytes(hostname)
  File "/home/amaricich/.local/lib/python2.7/site-packages/twisted/internet/_sslverify.py", line 86, in _idnaBytes
    return idna.encode(text)
  File "/usr/local/lib/python2.7/dist-packages/idna/core.py", line 355, in encode
    result.append(alabel(label))
  File "/usr/local/lib/python2.7/dist-packages/idna/core.py", line 276, in alabel
    check_label(label)
  File "/usr/local/lib/python2.7/dist-packages/idna/core.py", line 253, in check_label
    raise InvalidCodepoint('Codepoint {0} at position {1} of {2} not allowed'.format(_unot(cp_value), pos+1, repr(label)))
idna.core.InvalidCodepoint: Codepoint U+003A at position 6 of u'https://example' not allowed

This error lead me to believe the parameter being passed into optionsForClientTLS were incorrect. It calls for a hostname and not a full url, so I shortened the parameter to simply example.com. Once that change was made, the function completed successfully.

Unfortunately though, after making the change, the script now failed at the line invoking agent.request. The error it supplied was this:

Traceback (most recent call last):
  File "https.py", line 19, in <module>
    main()
  File "https.py", line 13, in main
    d = agent.request("GET", "https://example.com/")
  File "/home/amaricich/.local/lib/python2.7/site-packages/twisted/web/client.py", line 1596, in request
    endpoint = self._getEndpoint(parsedURI)
  File "/home/amaricich/.local/lib/python2.7/site-packages/twisted/web/client.py", line 1580, in _getEndpoint
    return self._endpointFactory.endpointForURI(uri)
  File "/home/amaricich/.local/lib/python2.7/site-packages/twisted/web/client.py", line 1456, in endpointForURI
    uri.port)
  File "/home/amaricich/.local/lib/python2.7/site-packages/twisted/web/client.py", line 982, in creatorForNetloc
    context = self._webContextFactory.getContext(hostname, port)
AttributeError: 'ClientTLSOptions' object has no attribute 'getContext'

This error leads me to believe that the object being produced by optionsForClientTLS is not the object type that is expected to be passed into the Agent upon creation. A function is trying to be invoked that does not exist. With all that said, I have two questions.

  1. Is this example deprecated? The previous examples that make http requests all work like a charm. Am I doing something wrong, or is the example no longer valid?
  2. I am only looking for a simple way to retrieve data from a server using HTTPS. If doing things this way is not the solution, is anyone familiar with how HTTPS requests can be made using twisted?

Upvotes: 2

Views: 1752

Answers (1)

notorious.no
notorious.no

Reputation: 5107

Yes you're absolutely correct that the example on the docs is wrong. I noticed the bug while working w/ treq. Try following this example from v14. With that being said, you should use treq as opposed to trying to use Twisted directly. Most of the heavy lifting has been taken care of for you. Here's an simple conversion of your example:

from __future__ import print_function
import treq
from twisted.internet import defer, task
from twisted.python.log import err

@defer.inlineCallbacks
def display(response):
    content = yield treq.content(response)
    print('Content: {0}'.format(content))

def main(reactor):
    d = treq.get('https://twistedmatrix.com')
    d.addCallback(display)
    d.addErrback(err)
    return d

task.react(main)

As you can see treq takes care of the SSL stuff for you. The display() callback function can be used to extract various components of the HTTP response, such as headers, status codes, body, etc. If you only need a single component, such as the response body, then you can simplify further like so:

def main(reactor):
    d = treq.get('https://twistedmatrix.com')
    d.addCallback(treq.content)     # get response content when available
    d.addErrback(err)
    d.addCallback(print)
    return d

task.react(main)

Upvotes: 3

Related Questions