jdizzle
jdizzle

Reputation: 4154

Constipated Python urllib2 sockets

I've been scouring the Internet looking for a solution to my problem with Python. I'm trying to use a urllib2 connection to read a potentially endless stream of data from an HTTP server. It's part of some interactive communication, so it's important that I can get the data that's available, even if it's not a whole buffer full. There seems to be no way to have read \ readline return the available data. It will block forever waiting for the entire (endless) stream before it returns.

Even if I set the underlying file descriptor to non-blocking using fnctl, the urllib2 file-object still blocks!! In general there seems to be no way to make python file-objects, upon read, return all available data if there is some and block otherwise.

I've seen a few posts about people seeking help with this, but I have seen no solutions. What gives? Am I missing something? This seems like such a normal use-case to completely ruin! I'm hoping to utilize urllib2's ability to detect configured proxies and use chunked encoding, but I can't if it won't cooperate.

Edit: Upon request, here is some example code

Client:

connection = urllib2.urlopen(commandpath)
id = connection.readline()

Now suppose that the server is using chunked transfer encoding, and writes one chunk down the stream and the chunk contains the line, and then waits. The connection is still open, but the client has data waiting in a buffer.

I cannot get read or readline to return the data I know it has waiting for it, because it tries to read until the end of the connection. In this case the connection may never close so it will wait either forever or until an inactivity timeout occurs, severing the connection. Once the connection is severed it will return, but that's obviously not the behavior I want.

Upvotes: 1

Views: 1238

Answers (1)

Wim
Wim

Reputation: 11252

urllib2 operates at the HTTP level, which works with complete documents. I don't think there's a way around that without hacking into the urllib2 source code.

What you can do is use plain sockets (you'll have to talk HTTP yourself in this case), and call sock.recv(maxbytes) which does read only available data.

Update: you may want to try to call conn.fp._sock.recv(maxbytes), instead of conn.read(bytes) on an urllib2 connection.

Upvotes: 1

Related Questions