Reputation: 245
I am currently having some trouble accessing content hosted via https using the twisted python library. I am new to this library, and am assuming there is some concept I am missing that's causing the issue, but perhaps not based upon the example.
Here is a link to the page in which I gathered the example: https://twistedmatrix.com/documents/current/web/howto/client.html
Under the heading HTTP over SSL
from twisted.python.log import err
from twisted.web.client import Agent
from twisted.internet import reactor
from twisted.internet.ssl import optionsForClientTLS
def display(response):
print("Received response")
print(response)
def main():
contextFactory = optionsForClientTLS(u"https://example.com/")
agent = Agent(reactor, contextFactory)
d = agent.request("GET", "https://example.com/")
d.addCallbacks(display, err)
d.addCallback(lambda ignored: reactor.stop())
reactor.run()
if __name__ == "__main__":
main()
When running this code, it straight up fails. I get an error that looks like this:
Traceback (most recent call last):
File "https.py", line 19, in <module>
main()
File "https.py", line 11, in main
contextFactory = optionsForClientTLS(u"https://example.com/")
File "/home/amaricich/.local/lib/python2.7/site-packages/twisted/internet/_sslverify.py", line 1336, in optionsForClientTLS
return ClientTLSOptions(hostname, certificateOptions.getContext())
File "/home/amaricich/.local/lib/python2.7/site-packages/twisted/internet/_sslverify.py", line 1198, in __init__
self._hostnameBytes = _idnaBytes(hostname)
File "/home/amaricich/.local/lib/python2.7/site-packages/twisted/internet/_sslverify.py", line 86, in _idnaBytes
return idna.encode(text)
File "/usr/local/lib/python2.7/dist-packages/idna/core.py", line 355, in encode
result.append(alabel(label))
File "/usr/local/lib/python2.7/dist-packages/idna/core.py", line 276, in alabel
check_label(label)
File "/usr/local/lib/python2.7/dist-packages/idna/core.py", line 253, in check_label
raise InvalidCodepoint('Codepoint {0} at position {1} of {2} not allowed'.format(_unot(cp_value), pos+1, repr(label)))
idna.core.InvalidCodepoint: Codepoint U+003A at position 6 of u'https://example' not allowed
This error lead me to believe the parameter being passed into optionsForClientTLS were incorrect. It calls for a hostname and not a full url, so I shortened the parameter to simply example.com. Once that change was made, the function completed successfully.
Unfortunately though, after making the change, the script now failed at the line invoking agent.request. The error it supplied was this:
Traceback (most recent call last):
File "https.py", line 19, in <module>
main()
File "https.py", line 13, in main
d = agent.request("GET", "https://example.com/")
File "/home/amaricich/.local/lib/python2.7/site-packages/twisted/web/client.py", line 1596, in request
endpoint = self._getEndpoint(parsedURI)
File "/home/amaricich/.local/lib/python2.7/site-packages/twisted/web/client.py", line 1580, in _getEndpoint
return self._endpointFactory.endpointForURI(uri)
File "/home/amaricich/.local/lib/python2.7/site-packages/twisted/web/client.py", line 1456, in endpointForURI
uri.port)
File "/home/amaricich/.local/lib/python2.7/site-packages/twisted/web/client.py", line 982, in creatorForNetloc
context = self._webContextFactory.getContext(hostname, port)
AttributeError: 'ClientTLSOptions' object has no attribute 'getContext'
This error leads me to believe that the object being produced by optionsForClientTLS is not the object type that is expected to be passed into the Agent upon creation. A function is trying to be invoked that does not exist. With all that said, I have two questions.
Upvotes: 2
Views: 1752
Reputation: 5107
Yes you're absolutely correct that the example on the docs is wrong. I noticed the bug while working w/ treq
. Try following this example from v14. With that being said, you should use treq
as opposed to trying to use Twisted directly. Most of the heavy lifting has been taken care of for you. Here's an simple conversion of your example:
from __future__ import print_function
import treq
from twisted.internet import defer, task
from twisted.python.log import err
@defer.inlineCallbacks
def display(response):
content = yield treq.content(response)
print('Content: {0}'.format(content))
def main(reactor):
d = treq.get('https://twistedmatrix.com')
d.addCallback(display)
d.addErrback(err)
return d
task.react(main)
As you can see treq
takes care of the SSL stuff for you. The display()
callback function can be used to extract various components of the HTTP response, such as headers, status codes, body, etc. If you only need a single component, such as the response body, then you can simplify further like so:
def main(reactor):
d = treq.get('https://twistedmatrix.com')
d.addCallback(treq.content) # get response content when available
d.addErrback(err)
d.addCallback(print)
return d
task.react(main)
Upvotes: 3