dma_k
dma_k

Reputation: 10639

URLConnection does not handle content length via proxy correctly

I faced the following problem: When URLConnection is used via proxy the content length is always set to -1.

First I checked that proxy really returns the Content-Length (lynx and wget are also working via proxy; there is no other way to go to internet from local network):

$ lynx -source -head ftp://ftp.wipo.int/pub/published_pct_sequences/publication/2003/1218/WO03_104476/WO2003-104476-001.zip
HTTP/1.1 200 OK
Last-Modified: Mon, 09 Jul 2007 17:02:37 GMT
Content-Type: application/x-zip-compressed
Content-Length: 30745
Connection: close
Date: Thu, 02 Feb 2012 17:18:52 GMT

$ wget -S -X HEAD ftp://ftp.wipo.int/pub/published_pct_sequences/publication/2003/1218/WO03_104476/WO2003-104476-001.zip
--2012-04-03 19:36:54--  ftp://ftp.wipo.int/pub/published_pct_sequences/publication/2003/1218/WO03_104476/WO2003-104476-001.zip
Resolving proxy... 10.10.0.12
Connecting to proxy|10.10.0.12|:8080... connected.
Proxy request sent, awaiting response...
  HTTP/1.1 200 OK
  Last-Modified: Mon, 09 Jul 2007 17:02:37 GMT
  Content-Type: application/x-zip-compressed
  Content-Length: 30745
  Connection: close
  Age: 0
  Date: Tue, 03 Apr 2012 17:36:54 GMT
Length: 30745 (30K) [application/x-zip-compressed]
Saving to: `WO2003-104476-001.zip'

In Java I wrote:

URL url = new URL("ftp://ftp.wipo.int/pub/published_pct_sequences/publication/2003/1218/WO03_104476/WO2003-104476-001.zip");
int length = url.openConnection().getContentLength();
logger.debug("Got length: " + length);

and I get -1. I started to debug FtpURLConnection and it turned out that the necessary information is in underlying HttpURLConnection.responses field however it is never properly populated from there:

enter image description here (there is Content-Length: 30745 in headers). The content length is not updated when you start reading the stream or even after the stream was read. Code:

URL url = new URL("ftp://ftp.wipo.int/pub/published_pct_sequences/publication/2003/1218/WO03_104476/WO2003-104476-001.zip");
URLConnection connection = url.openConnection();

logger.debug("Got length (1): " + connection.getContentLength());

InputStream input = connection.getInputStream();

byte[] buffer = new byte[4096];
int count = 0, len;
while ((len = input.read(buffer)) > 0) {
    count += len;
}

logger.debug("Got length (2): " + connection.getContentLength() + " but wanted " + count);

Output:

Got length (1): -1
Got length (2): -1 but wanted 30745

It seems like it is a bug in JDK6, so I have opened new bug#7168608.

Upvotes: 5

Views: 3119

Answers (3)

sw1nn
sw1nn

Reputation: 7328

Remember that proxies will often change the representation of the underlying entity. In your case I suspect the proxy is probably altering the transfer encoding. Which in turn makes the Content-Length meaningless even if supplied.

You are falling afoul of the following two sections of the HTTP 1.1 spec:

4.4 Message Length

  1. ...
  2. ...
  3. If a Content-Length header field (section 14.13) is present, its decimal value in OCTETs represents both the entity-length and the transfer-length. The Content-Length header field MUST NOT be sent if these two lengths are different (i.e., if a Transfer-Encoding header field is present). If a message is received with both a Transfer-Encoding header field and a Content-Length header field, the latter MUST be ignored.

14.41 Transfer-Encoding

The Transfer-Encoding general-header field indicates what (if any) type of transformation has been applied to the message body in order to safely transfer it between the sender and the recipient. This differs from the content-coding in that the transfer-coding is a property of the message, not of the entity.

Transfer-Encoding       = "Transfer-Encoding" ":" 1#transfer-coding

Transfer-codings are defined in section 3.6. An example is:

Transfer-Encoding: chunked

If multiple encodings have been applied to an entity, the transfer- codings MUST be listed in the order in which they were applied. Additional information about the encoding parameters MAY be provided by other entity-header fields not defined by this specification.

Many older HTTP/1.0 applications do not understand the Transfer- Encoding header.

So The URLConnection is then ignoring the Content-Length header, as per the spec because it is meaningless in the presence of chunked transfers

In your debugger screenshot it's not clear whether the Transfer-Encoding header is present. Please let us know...

On further investigation - it seems that lynx does not show all the headers returned when you issue a lynx -head. It is not showing the Transfer-Encoding header critical to this discussion.

Here's the proof of the discrepancy with a publically visible website

Ξ▶ lynx -useragent='dummy' -source -head http://www.bbc.co.uk                                                                                                                  
HTTP/1.1 302 Found
Server: Apache
X-Cache-Action: PASS (non-cacheable)
X-Cache-Age: 0
Content-Type: text/html; charset=iso-8859-1
Date: Tue, 03 Apr 2012 13:33:06 GMT
Location: http://www.bbc.co.uk/mobile/
Connection: close

Ξ▶ wget -useragent='dummy' -S -X HEAD http://www.bbc.co.uk                                                                                                                 
--2012-04-03 14:33:22--  http://www.bbc.co.uk/
Resolving www.bbc.co.uk... 212.58.244.70
Connecting to www.bbc.co.uk|212.58.244.70|:80... connected.
HTTP request sent, awaiting response... 
HTTP/1.1 200 OK
Server: Apache
Cache-Control: private, max-age=15
Etag: "7e0f292b2e5e4c33cac1bc033779813b"
Content-Type: text/html
Transfer-Encoding: chunked
Date: Tue, 03 Apr 2012 13:33:22 GMT
Connection: keep-alive
X-Cache-Action: MISS
X-Cache-Age: 0
X-LB-NoCache: true
Vary: Cookie

Since I am obviously not inside your network I can't replicate your exact circumstances, but please validate that you really aren't getting a Transfer-Encoding header when passing through a proxy.

Upvotes: 2

jtahlborn
jtahlborn

Reputation: 53694

I think it's a "bug" in the jdk related to handling ftp connections which are proxied. The FtpURLConnection delegates to an HttpURLConnection when a proxy is in use. however, the FtpURLConnection doesn't seem to delegate any of the header management to this HttpURLConnection in this situation. thus, you can correctly get the streams, but i don't think you can access any "header" values like content length or content type. (this is based on a quick glance over the openjdk source for 1.6, i could have missed something).

Upvotes: 1

Sasha O
Sasha O

Reputation: 3749

One thing to check I would do is to actually read the response (writing off the top of my head so expect mistakes):

URLConnection connection= url.openConnection();
InputStream input= connection.getInputStream();
byte[] buffer= new byte[4096];
while(input.read(buffer) > 0)
  ;
logger.debug("Got length: " + getContentLength());

If the size you are getting is good, then look for a way for make URLConnection read the header but not the data to avoid reading the whole response.

Upvotes: 0

Related Questions