Reputation: 3186
I have a snippet of code, shown below, that uses urllib2 .. I'm trying to convert it to pycurl to benefit from the pycurl proxy support. The converted code of pycurl is shown after the original code.. I want to know how to change the urllib.urlopen(req).read() to something similar in pycurl .. maybe using something like strinIO?
urllib2 code:
URL = 'URL'
UN = 'UN'
PWD = 'PWD'
HEADERS = { 'Accept': 'application/json',
'Connection': 'Keep-Alive',
'Accept-Encoding' : 'gzip',
'Authorization' : 'Basic %s' % base64.encodestring('%s:%s' % (UN, PWD)) }
req = urllib2.Request(URL, headers=HEADERS)
response = urllib2.urlopen(req, timeout=(KEEP_ALIVE))
# header - print response.info()
decompressor = zlib.decompressobj(16+zlib.MAX_WBITS)
remainder = ''
while True:
tmp = decompressor.decompress(response.read(CHUNKSIZE))
the pycurl conversion with proxy support:
URL = 'URL'
UN = 'UN'
PWD = 'PWD'
HEADERS = [ 'Accept : application/json',
'Connection : Keep-Alive',
'Accept-Encoding : gzip',
'Authorization : Basic %s' % base64.encodestring('%s:%s' % (UN, PWD)) ]
req = pycurl.Curl()
req.setopt(pycurl.CONNECTTIMEOUT,KEEP_ALIVE)
req.setopt(pycurl.HTTPHEADER, HEADERS)
req.setopt(pycurl.TIMEOUT, 1+KEEP_ALIVE)
req.setopt(pycurl.PROXY, 'http://my-proxy')
req.setopt(pycurl.PROXYPORT, 8080)
req.setopt(pycurl.PROXYUSERPWD, "proxy_access_user : proxy_access_password")
req.setopt(pycurl.URL , URL)
response = req.perform()
decompressor = zlib.decompressobj(16+zlib.MAX_WBITS)
remainder = ''
while True:
tmp = decompressor.decompress(urllib2.urlopen(req).read(CHUNKSIZE))
thanks in advance.
Upvotes: 1
Views: 971
Reputation: 365787
Unlike urllib2
, which returns an object that you can use to get the data, curl
needs you to pass it an object that it can use to store the data.
The simple way to do this, used in most of the examples, is to pass a file object as the WRITEDATA
option. You might think you could just pass a StringIO
here, like this:
# ...
s = StringIO.StringIO()
req.setopt(pycurl.WRITEDATA, s)
req.perform()
data = s.getvalue()
Unfortunately, that won't work, as the file object has to be a real file (or at least something with a C-level file descriptor), and a StringIO
doesn't qualify.
You could of course use a NamedTemporaryFile
, but if you'd prefer to keep the file in memory—or, better, not store it on memory or on disk, but just process it on the fly—that won't help.
The solution is to use the WRITEFUNCTION
option instead:
s = StringIO.StringIO()
req.setopt(pycurl.WRITEFUNCTION, s.write)
req.perform()
data = s.getvalue()
As you can see, you can use a StringIO
for this if you want—in fact, that's exactly what the curl
object documentation from pycurl
does—but it's not really simplifying things too much over any other way of accumulating strings (like putting them in a list and ''.join
-ing them, or even just concatenating them onto a string).
Note that I linked to the C-level libcurl
docs, not the pycurl
docs, because pycurl
's documentation basically just says "FOO does the same thing as CURLOPT_FOO" (even when there are differences, like the fact that your WRITEFUNCTION
doesn't get the size, nmemb, and userdata parameters).
What if you want to stream the data on the fly? Just use a WRITEFUNCTION
that accumulates and processes it on the fly. You won't be writing a loop yourself, but curl
will be looping internally and driving the process. For example:
z = zlib.decompressobj()
s = []
def handle(chunk):
s.append(z.decompress(chunk))
return len(chunk)
req.setopt(pycurl.WRITEFUNCTION, handle)
req.perform()
s.append(z.flush())
data = ''.join(s)
curl
will call your function once for each chunk of data it retrieves, so the entire loop happens inside that req.perform()
call. (It may also call it again with 0 bytes at the end, so make sure you callback function can handle that. I think z.decompress
can, but you might want to verify that.)
There are ways to limit the size of each write, to abort the download in the middle, to get the header as part of the write instead of separately, etc., but usually you won't need to touch those.
Upvotes: 2