tehryan
tehryan

Reputation: 25081

using cookies with twisted.web.client

I'm trying to make a web client application using twisted but having some trouble with cookies. Does anyone have an example I can look at?

Upvotes: 3

Views: 2871

Answers (3)

mor
mor

Reputation: 11

from twisted.internet import reactor
from twisted.web import client

def getPage(url, contextFactory=None, *args, **kwargs):
    return client._makeGetterFactory(
        url,
        CustomHTTPClientFactory,
        contextFactory=contextFactory,
        *args, **kwargs).deferred

class CustomHTTPClientFactory(client.HTTPClientFactory):

    def __init__(self,url, method='GET', postdata=None, headers=None,
                 agent="Twisted PageGetter", timeout=0, cookies=None,
                 followRedirect=1, redirectLimit=20):
        client.HTTPClientFactory.__init__(self, url, method, postdata,
                                          headers, agent, timeout, cookies,
                                          followRedirect, redirectLimit)

    def page(self, page):
        if self.waiting:
            self.waiting = 0
            res = {}
            res['page'] = page
            res['headers'] = self.response_headers
            res['cookies'] = self.cookies
            self.deferred.callback(res)

if __name__ == '__main__':
    def cback(result):
        for k in result:
            print k, '==>', result[k]
        reactor.stop()

    def eback(error):
        print error.getTraceback()
        reactor.stop()

    d = getPage('http://example.com', agent='example web client', 
                 cookies={ 'some' : 'cookie' } )
    d.addCallback(cback)
    d.addErrback(eback)

    reactor.run()

Upvotes: 1

Jean-Paul Calderone
Jean-Paul Calderone

Reputation: 48335

While it's true that getPage doesn't easily allow direct access to the request or response headers (just one example of how getPage isn't a super awesome API), cookies are actually supported.

cookies = {cookies: tosend}
d = getPage(url, cookies=cookies)
def cbPage(result):
    print 'Look at my cookies:', cookies
d.addCallback(cbPage)

Any cookies in the dictionary when it is passed to getPage will be sent. Any new cookies the server sets in response to the request will be added to the dictionary.

You might have missed this feature when looking at getPage because the getPage signature doesn't have a cookies parameter anywhere in it! However, it does take **kwargs, and this is how cookies is supported: any extra arguments passed to getPage that it doesn't know about itself, it passes on to HTTPClientFactory.__init__. Take a look at that method's signature to see all of the things you can pass to getPage.

Upvotes: 7

tehryan
tehryan

Reputation: 25081

Turns out there is no easy way afaict The headers are stored in twisted.web.client.HTTPClientFactory but not available from twisted.web.client.getPage() which is the function designed for pulling back a web page. I ended up rewriting the function:

from twisted.web import client

def getPage(url, contextFactory=None, *args, **kwargs):
    fact = client._makeGetterFactory(
        url,
        HTTPClientFactory,
        contextFactory=contextFactory,
        *args, **kwargs)
    return fact.deferred.addCallback(lambda data: (data, fact.response_headers))

Upvotes: 2

Related Questions