Reputation: 33
I'm in a situation where I have to construct the body for a multipart/form-data POST request manually. I understand the structure just fine, and I can successfully upload a form that does not include files. I have a file as a File
object and I need to interpret the contents of the file as a string to include them in the body of the request. All the examples I have come across of multipart form data with files just have something like "contents of file go here" where the file is included and never discuss how to get from file to string. The top answer for this question comes close to what I'm looking for, but I'd prefer to avoid the extra overhead of base64 since my form will be handling many files. I have found that
`
--${boundary}
Content-Disposition: form-data; name="file"; filename="${file.name}"
Content-Type: ${file.type}
${await file.text()}`
works for a simple pdf but fails with a jpeg (here "fails" means that my server cannot parse the image correctly).
I have a working a example using a FormData
instance with Fetch (I cannot use FormData
in production). In the Chrome developers tools I can get the raw body of the request to see what the file looks like. Here's what the beginning of the file looks like there:
Content-Disposition: form-data; name="file"; filename="test.jpg"
Content-Type: image/jpeg
ÿØÿî!AdobedÀ E¿d„¾¤ÿÛ„
$$''$$53335;;;;;;;;;;
Using file.text()
the same portion of the message looks like:
����!Adobed� E�d������
$$''$$5333
When the file is decoded like this:
`
--${boundary}
Content-Disposition: form-data; name="file"; filename="${file.name}"
Content-Type: ${file.type}
${String.fromCharCode.apply(null, new Uint8Array(await file.arrayBuffer()))}`
}
result += `
The beginning of the file looks correct but comparing the full strings shows that there are a few differences.
I found this
4.3 Encoding
While the HTTP protocol can transport arbitrary binary data, the
default for mail transport is the 7BIT encoding. The value supplied
for a part may need to be encoded and the "content-transfer-encoding"
header supplied if the value does not conform to the default
encoding. [See section 5 of RFC 2046 for more details.]
in RFC 2388, but I believe this is referring to how the request body is sent over-the-wire and not about how the body is constructed. I feel like I'm missing some core concept here. Any help will be greatly appreciated.
EDIT: Here is how the form data is being sent to my server:
const response = await fetch(url, {
method: 'POST', // *GET, POST, PUT, DELETE, etc.
mode: 'cors', // no-cors, *cors, same-origin
cache: 'no-cache', // *default, no-cache, reload, force-cache, only-if-cached
credentials: 'same-origin', // include, *same-origin, omit
redirect: 'follow', // manual, *follow, error
referrer: 'no-referrer', // no-referrer, *client
body: serializedData, // body data type must match "Content-Type" header
headers: {
'Content-Type': 'multipart/form-data; boundary=' + boundary,
},
})
Upvotes: 0
Views: 5347
Reputation: 33
4.10.21.7 Multipart form data The multipart/form-data encoding algorithm, given an entry list and encoding, is as follows:
Let result be the empty string.
For each entry in entry list:
For each character in the entry's name and value that cannot be expressed using the selected character encoding, replace the character by a string consisting of a U+0026 AMPERSAND character (&), a U+0023 NUMBER SIGN character (#), one or more ASCII digits representing the code point of the character in base ten, and finally a U+003B (;).
Encode the (now mutated) entry list using the rules described by RFC 7578, Returning Values from Forms: multipart/form-data, and return the resulting byte stream. [RFC7578]
Each entry in entry list is a field, the name of the entry is the field name and the value of the entry is the field value.
The order of parts must be the same as the order of fields in entry list. Multiple entries with the same name must be treated as distinct fields.
The parts of the generated multipart/form-data resource that correspond to non-file fields must not have a
Content-Type
header specified. Their names and values must be encoded using the character encoding selected above.File names included in the generated multipart/form-data resource (as part of file fields) must use the character encoding selected above, though the precise name may be approximated if necessary (e.g. newlines could be removed from file names, quotes could be changed to "%22", and characters not expressible in the selected character encoding could be replaced by other characters).
The boundary used by the user agent in generating the return value of this algorithm is the multipart/form-data boundary string. (This value is used to generate the MIME type of the form submission payload generated by this algorithm.)
For details on how to interpret multipart/form-data payloads, see RFC 7578. [RFC7578] -- HTML: The Living Standard
This definitely answers my question, but I'm still a bit confused on the implementation.
Upvotes: 0