user3843970
user3843970

Reputation: 21

How to shutdown an httplib2 request when it is too long

I have a pretty annoying issue at the moment. When I process to a httplib2.request with a way too large page, I would like to be able to stop it cleanly.

For example :

from httplib2 import Http
url = 'http://media.blubrry.com/podacademy/p/content.blubrry.com/podacademy/Neuroscience_and_Society_1.mp3'
h = Http(timeout=5)
h.request(url, 'GET')

In this example, the url is a podcast and it will keep being downloaded forever. My main process will hang indefinitely in this situation.

I have tried to set it in a separate thread using this code and to delete straight my object.

def http_worker(url, q):
    h = Http()
    print 'Http worker getting %s' % url
    q.put(h.request(url, 'GET'))

def process(url):
    q = Queue.Queue()
    t = Thread(target=http_worker, args=(url, q))                    
    t.start()
    tid = t.ident
    t.join(3)
    if t.isAlive():              
        try:
            del t            
            print 'deleting t'
        except: print 'error deleting t'
    else: print q.get()

    check_thread(tid)

process(url)

Unfortunately, the thread is still active and will continue to consume cpu / memory.

def check_thread(tid):
    import sys
    print 'Thread id %s is still active ? %s' % (tid, tid in sys._current_frames().keys() )

Thank you.

Upvotes: 1

Views: 351

Answers (1)

user3843970
user3843970

Reputation: 21

Ok I found an hack to be able to deal with this issue.

The best solution so far is to set a maximum of data read and to stop reading from the socket. The data is read from the method _safe_read of httplib module. In order to overwrite this method, I used this lib : http://blog.rabidgeek.com/?tag=wraptools

And voila :

 from httplib import HTTPResponse, IncompleteRead, MAXAMOUNT
 from wraptools import wraps
 @wraps(httplib.HTTPResponse._safe_read)
 def _safe_read(original_method, self, amt):
     """Read the number of bytes requested, compensating for partial reads.

     Normally, we have a blocking socket, but a read() can be interrupted
     by a signal (resulting in a partial read).

     Note that we cannot distinguish between EOF and an interrupt when zero
     bytes have been read. IncompleteRead() will be raised in this
     situation.

     This function should be used when <amt> bytes "should" be present for
     reading. If the bytes are truly not available (due to EOF), then the
     IncompleteRead exception can be used to detect the problem.
     """
     # NOTE(gps): As of svn r74426 socket._fileobject.read(x) will never
     # return less than x bytes unless EOF is encountered.  It now handles
     # signal interruptions (socket.error EINTR) internally.  This code
     # never caught that exception anyways.  It seems largely pointless.
     # self.fp.read(amt) will work fine.
     s = []
     total = 0
     MAX_FILE_SIZE = 3*10**6
     while amt > 0 and total < MAX_FILE_SIZE:
         chunk = self.fp.read(min(amt, httplib.MAXAMOUNT))
         if not chunk:
             raise IncompleteRead(''.join(s), amt)
         total = total + len(chunk)
         s.append(chunk)
         amt -= len(chunk)
     return ''.join(s)

In this case, MAX_FILE_SIZE is set to 3Mb.

Hopefully, this will help others.

Upvotes: 1

Related Questions