I've read conflicting and somewhat ambiguous replies to the question "How is a multipart HTTP request content length calculated?". Specifically I wonder: What is the precise content range for which the "Content-length" header is calculated? Are CRLF ("\r\n") octet sequences counted as one or two octets? Can someone provide a clear example to answer these questions?

http-headersmultipartcontent-lengthhttp-content-length

Reputation: 1982

How is an HTTP multipart "Content-length" header value calculated?

I've read conflicting and somewhat ambiguous replies to the question "How is a multipart HTTP request content length calculated?". Specifically I wonder:

What is the precise content range for which the "Content-length" header is calculated?
Are CRLF ("\r\n") octet sequences counted as one or two octets?

Can someone provide a clear example to answer these questions?

Upvotes: 21

Answers (5)

7stud

Reputation: 48649

In reply to Moshe Rubin's answer:

RFC 2046, Section: Abstract

Because RFC 822 said so little about message bodies, these documents (Ed: RFC 2045, 2046, 2047, 2048, 2049) are largely orthogonal to (rather than a revision of) RFC 822.

RFC 2046, Section 5.1: Multipart Media Type

In the case of multipart entities, in which one or more different
sets of data are combined in a single body, a "multipart" media type
field must appear in the entity's header. The body must then contain one or more body parts, each preceded by a boundary delimiter line,
and the last one followed by a closing boundary delimiter line.
After its boundary delimiter line, each body part then consists of a
header area, a blank line, and a body area. Thus a body part is
similar to an RFC 822 message in syntax, but different in meaning.

RFC 2046, Section 5.1.1: Common Syntax

This Content-Type value indicates that the content consists of one or more parts, each with a structure that is syntactically identical to an RFC 822 message, except that the header area is allowed to be completely empty, and that the parts are each preceded by the line
<example of a boundary line>

RFC 2045, Section 2.1: "CRLF"

The term CRLF, in this set of documents, refers to the sequence of
octets corresponding to the two US-ASCII characters CR (decimal value 13) and LF (decimal value 10) which, taken together, in this order,
denote a line break in RFC 822 mail.

Note the use of the plural "octets". Based on those two RFC's, I do think that a CLRF is defined by the spec to be two octets--not one.

Next, according to RFC 2387 a multipart/related Content-Type header MUST include a type parameter:

RFC 2387: The MIME Multipart/Related Content-type

The type parameter must be specified and its value is the MIME media type of the "root" body part.

So, in order for google to be be creating a multipart/related request in accordance with the spec, the Content-Type header in the example should be:

Content-type: multipart/related; 
              boundary="....."; 
              type="application/json"

Based on the RFC's quoted above, I do not think google is following the spec in two instances.

Here's what the byte count should be for the body of the multipart request in the google example:

start character
|
V
--===============0688100289== Content-type: application/json

{"title": "test-multipart.txt", "parents": [{"id":"0B09i2ZH5SsTHTjNtSS9QYUZqdTA"}], "properties": [{"kind": "drive#property", "key": "cloudwrapper", "value": "true"}]}
--===============0688100289== Content-type: text/plain

We're testing multipart uploading!
--===============0688100289==--
                              ^
                              |
                              end character

First, let's test a file that can easily be counted! Here's a simple file:

test_byte_count.txt

12345
12345

There's 5 bytes (= 8-bit bytes or octets) on the first line, followed by a newline, which is \n on my system, so 1 byte, then there are 5 bytes on the second line (no terminating newline). Confirmed by hexdump:

hexdump -c test_byte_count.txt 
0000000   1   2   3   4   5  \n   1   2   3   4   5                    
000000b

So, the byte count should be 11:

$  wc -c test_byte_count.txt
11 test_byte_count.txt

And, hexdump actually gives the number of bytes on the last line:

000000b

In hexidecimal notation, A is 10 and B is 11 and C is 12, etc, so hexdump is reporting that the file is 11 bytes long.

Next, as discussed above the http protocol requires newlines sent over the wire to be represented by the two octets \r\n, so the byte count for the file test_byte_count.txt with two octets for the newline should be 12. unix2dos will convert a file with unix newlines to a file with dos newlines, i.e. \r\n:

$ unix2dos -n test_byte_count.txt test_byte_count_dos_newlines.txt
unix2dos: converting file test_byte_count.txt to file test_byte_count_dos_newlines.txt in DOS format...

Here's what's in the file test_byte_count_dos_newlines.txt:

$ hexdump -c test_byte_count_dos_newlines.txt 
0000000   1   2   3   4   5  \r  \n   1   2   3   4   5                
000000c


$ wc -c test_byte_count_dos_newlines.txt 
12 test_byte_count_dos_newlines.txt

Therefore, I just need to use unix2dos to convert a file containing the body of the multipart request to a file with dos newlines, then get the byte count. First, here is the byte count before converting \n newlines to \r\n newlines:

$ cat multipart_request_unix_newlines.txt 
--===============0688100289==
Content-type: application/json

{"title": "test-multipart.txt", "parents": [{"id":"0B09i2ZH5SsTHTjNtSS9QYUZqdTA"}], "properties": [{"kind": "drive#property", "key": "cloudwrapper", "value": "true"}]}
--===============0688100289==
Content-type: text/plain

We're testing multipart uploading!
--===============0688100289==--
$ wc -c multipart_request_unix_newlines.txt 
352 multipart_request_unix_newlines.txt

And, unix2dos can actually report the number of dos and unix newlines in a file:

$ unix2dos -i multipart_request_unix_newlines.txt            
       0       8       0  no_bom    text    multipart_request_unix_newlines.txt

The first and second column are the number of dos and unix newlines found in the file (the third column is for old Mac newlines \r). Eight unix newlines were found in the file, so when unix2dos converts those newlines to dos newlines we would expect 8 more bytes to be added to the file, and 8 bytes added to the previously reported byte count of 352, gives us 360 bytes. Therefore, we should expect there to be 360 bytes in the converted file:

$ unix2dos -n multipart_request_unix_newlines.txt multipart_request_dos_newlines.txt
unix2dos: converting file multipart_request_unix_newlines.txt to file multipart_request_dos_newlines.txt in DOS format...
$ wc -c multipart_request_dos_newlines.txt 
360 multipart_request_dos_newlines.txt

It appears likely that google calculated the byte count of the body of the multipart request before converting the body to \r\n newlines.

Upvotes: 1

Moshe Rubin

Reputation: 1982

The following live example should hopefully answer the questions.

##Perform multipart request with Google's OAuth 2.0 Playground##

Google's OAuth 2.0 Playground web page is an excellent way to perform a multipart HTTP request against the Google Drive cloud. You don't have to understand anything about Google Drive to do this -- I'll do all the work for you. We're only interested in the HTTP request and response. Using the Playground, however, will allow you to experiment with multipart and answer other questions, should the need arise.

Create a test file for uploading

I created a local text file called "test-multipart.txt", saved somewhere on my file system. The file is 34 bytes large and looks like this:

We're testing multipart uploading!

Open Google's OAuth 2.0 Playground

We first open Google's OAuth 2.0 Playground in a browser, using the URL https://developers.google.com/oauthplayground/:

Google OAuth 2.0 Playground opening screen

Fill in Step 1

Select the Drive API v2 and the "https://www.googleapis.com/auth/drive", and press "Authorize APIs":

Fields filled in for Step 1

Fill in Step 2

Click the "Exchange authorization code for tokens":

Fields filled in for Step 2

Fill in Step 3

Here we give all relevant multipart request information:

Set the HTTP Method to "POST"
There's no need to add any headers, Google's Playground will add everything needed (e.g., headers, boundary sequence, content length)
Request URI: "https://www.googleapis.com/upload/drive/v2/files?uploadType=multipart"
Enter the request body: this is some meta-data JSON required by Google Drive to perform the multipart upload. I used the following:

{"title": "test-multipart.txt", "parents": [{"id":"0B09i2ZH5SsTHTjNtSS9QYUZqdTA"}], "properties": [{"kind": "drive#property", "key": "cloudwrapper", "value": "true"}]}

At the bottom of the "Request Body" screen, choose the test-multipart.txt file for uploading.
Press the "Send the request" button

enter image description here

The request and response

Google's OAuth 2.0 Playground miraculously inserts all required headers, computes the content length, generates a boundary sequence, inserts the boundary string wherever required, and shows us the server's response: enter image description here

Analysis

The multipart HTTP request succeeded with a 200 status code, so the request and response are good ones we can depend upon. Google's Playground inserted everything we needed to perform the multipart HTTP upload. You can see the "Content-length" is set to 352. Let's look at each line after the blank line following the headers:

--===============0688100289==\r\n
Content-type: application/json\r\n
\r\n
{"title": "test-multipart.txt", "parents": [{"id":"0B09i2ZH5SsTHTjNtSS9QYUZqdTA"}], "properties": [{"kind": "drive#property", "key": "cloudwrapper", "value": "true"}]}\r\n
--===============0688100289==\r\n
Content-type: text/plain\r\n
\r\n
We're testing multipart uploading!\r\n
--===============0688100289==--

There are nine (9) lines, and I have manually added "\r\n" at the end of each of the first eight (8) lines (for readability reasons). Here are the number of octets (characters) in each line:

29 + '\r\n'
30 + '\r\n'
'\r\n'
167 + '\r\n'
29 + '\r\n'
24 + '\r\n'
'\r\n'
34 + '\r\n' (although '\r\n' is not part of the text file, Google inserts it)
31

The sum of the octets is 344, and considering each '\r\n' as a single one-octet sequence gives us the coveted content length of 344 + 8 = 352.

##Summary##

To summarize the findings:

The multipart request's "Content-length" is computed from the first byte of the boundary sequence following the header section's blank line, and continues until, and includes, the last hyphen of the final boundary sequence.
The '\r\n' sequences should be counted as one (1) octet, not two, regardless of the operating system you're running on.

NOTE: Many of the talkbacks believe that the '\r\n' sequence should be counted as two (2) octets, not one. Be sure to verify which is correct for your platform.

Upvotes: 7

Tatarize

Reputation: 10806

\n\r are two bytes.

Moshe Rubin's answer is wrong. That implementation is bugged there.

I sent a curl request to upload a file, and used WireShark to specifically harvest the exact actual data sent by my network. A methodology that everybody should agree is more valid than on online application somewhere gave me a number.

--------------------------de798c65c334bc76\r\n
Content-Disposition: form-data; name="file"; filename="requireoptions.txt"\r\n
Content-Type: text/plain\r\n
\r\n
Pillow
pyusb
wxPython
ezdxf
opencv-python-headless
\r\n--------------------------de798c65c334bc76--\r\n

Curl, which everybody will agree likely implemented this correctly: Content-Length: 250

> len("2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d646537393863363563333334626337360d0a436f6e74656e742d446973706f736974696f6e3a20666f726d2d646174613b206e616d653d2266696c65223b2066696c656e616d653d22726571756972656f7074696f6e732e747874220d0a436f6e74656e742d547970653a20746578742f706c61696e0d0a0d0a50696c6c6f770d0a70797573620d0a7778507974686f6e0d0a657a6478660d0a6f70656e63762d707974686f6e2d686561646c6573730d0a2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d646537393863363563333334626337362d2d0d0a")
500

(2x250 = 500, copied the hex stream out of WireShark.)

I took the actual binary there. The '2d' is --- which starts the boundary.

Please note, giving the wrong count to the server treating 0d0a as 1 rather than 2 octets (which is insane they are octets and cannot be compound), actively rejected the request as bad.

Also, this answers the second part of the question. The actual Content Length is everything here. From the first boundary to the last with the epilogue --\r\n, it's all the octets left in the wire.

Upvotes: 2

Pavel P

Reputation: 16892

If an http message has Content-Length header, then this header indicates exact number of bytes that follow after the HTTP headers. If anything decided to freely count \r\n as one byte then everything would fall apart: keep-alive http connections would stop working, as HTTP stack wouldn't be able to see where the next HTTP message starts and would try to parse random data as if it was an HTTP message.

Upvotes: 4

M Nottingham

Reputation: 5804

How you calculate Content-Length doesn't depend on the status code or media type of the payload; it's the number of bytes on the wire. So, compose your multipart response, count the bytes (and CRLF counts as two), and use that for Content-Length.

See: http://httpwg.org/specs/rfc7230.html#message.body.length

Upvotes: 11

How is an HTTP multipart &quot;Content-length&quot; header value calculated?