Reputation: 21
I have a pretty annoying issue at the moment. When I process to a httplib2.request with a way too large page, I would like to be able to stop it cleanly.
For example :
from httplib2 import Http
url = 'http://media.blubrry.com/podacademy/p/content.blubrry.com/podacademy/Neuroscience_and_Society_1.mp3'
h = Http(timeout=5)
h.request(url, 'GET')
In this example, the url is a podcast and it will keep being downloaded forever. My main process will hang indefinitely in this situation.
I have tried to set it in a separate thread using this code and to delete straight my object.
def http_worker(url, q):
h = Http()
print 'Http worker getting %s' % url
q.put(h.request(url, 'GET'))
def process(url):
q = Queue.Queue()
t = Thread(target=http_worker, args=(url, q))
t.start()
tid = t.ident
t.join(3)
if t.isAlive():
try:
del t
print 'deleting t'
except: print 'error deleting t'
else: print q.get()
check_thread(tid)
process(url)
Unfortunately, the thread is still active and will continue to consume cpu / memory.
def check_thread(tid):
import sys
print 'Thread id %s is still active ? %s' % (tid, tid in sys._current_frames().keys() )
Thank you.
Upvotes: 1
Views: 351
Reputation: 21
Ok I found an hack to be able to deal with this issue.
The best solution so far is to set a maximum of data read and to stop reading from the socket. The data is read from the method _safe_read of httplib module. In order to overwrite this method, I used this lib : http://blog.rabidgeek.com/?tag=wraptools
And voila :
from httplib import HTTPResponse, IncompleteRead, MAXAMOUNT
from wraptools import wraps
@wraps(httplib.HTTPResponse._safe_read)
def _safe_read(original_method, self, amt):
"""Read the number of bytes requested, compensating for partial reads.
Normally, we have a blocking socket, but a read() can be interrupted
by a signal (resulting in a partial read).
Note that we cannot distinguish between EOF and an interrupt when zero
bytes have been read. IncompleteRead() will be raised in this
situation.
This function should be used when <amt> bytes "should" be present for
reading. If the bytes are truly not available (due to EOF), then the
IncompleteRead exception can be used to detect the problem.
"""
# NOTE(gps): As of svn r74426 socket._fileobject.read(x) will never
# return less than x bytes unless EOF is encountered. It now handles
# signal interruptions (socket.error EINTR) internally. This code
# never caught that exception anyways. It seems largely pointless.
# self.fp.read(amt) will work fine.
s = []
total = 0
MAX_FILE_SIZE = 3*10**6
while amt > 0 and total < MAX_FILE_SIZE:
chunk = self.fp.read(min(amt, httplib.MAXAMOUNT))
if not chunk:
raise IncompleteRead(''.join(s), amt)
total = total + len(chunk)
s.append(chunk)
amt -= len(chunk)
return ''.join(s)
In this case, MAX_FILE_SIZE is set to 3Mb.
Hopefully, this will help others.
Upvotes: 1