Reputation: 2871
I have a question regarding file upload, which is more related to how it works rather than a code issue. I looked on the internet, but I couldn't find a proper answer.
I have a web application running on tomcat, which handles file uploads (through a servlet). Let's say I want now to upload huge files (> 1 Gb). My understading was that the multipart content of the HTTP request was available in my servlet once the whole file was actually transfered.
My question is where the content of the request is actually stored ? When one calls HttpServletRequest.getParts()
an InputStream
is available on the Part
object. However, where is the stream reading from ? Does Tomcat store it somewhere ?
I guess this might not be clear enough, so I'll update the post according to your comments, if any.
Thanks
Upvotes: 7
Views: 6820
Reputation: 1746
Here is it
I mention 1~5 because it is important to understand the stream returns by request.getInputStream() which is required before Servlet 3.x request.getParts() feature. Typically, tomcat will deliver the request to web app very soon, it is not necessary to wait client side to finish uploading, thus tomcat need not buffer a lot of data. I have left java server side for some years, before JSR-000315 is approved :-)
Upvotes: 0
Reputation: 20862
Tomcat follows the Servlet 3.0 specification which allows you to specify things such as how large of a multipart "part" can be before it gets stored (temporarily) on the disk, where temporary files will be written, what the maximum size of a file is, and what the maximum size of the whole request can be. You can find all kinds of good information about configuring multipart uploads (in Tomcat or any other spec-3.0-compliant server) here and here.
Tomcat's implementation specifics aren't terribly relevant: it adheres to the spec. If the file to be uploaded is smaller than the threshold set, then you should be able to read the bytes of the file from memory (i.e. no disk involved). If the file is larger, then it will be written to disk, first (in its entirety) and then you can get the bytes from the container.
So if you want to receive a 1GiB file and don't have that kind of memory available (I wouldn't recommend allowing clients to fill-up your heap with 1GiB of uploaded data for each upload... easy DoS if you just start several simultaneous 1GiB uploads and you are toast), then Tomcat (or whatever container you are using) will read the file (again, in its entirety) onto the disk, and when your servlet gets control, you can read the bytes back from that file.
Note that the container must process the entire multipart request before any of your code really runs. That's to prevent you from breaking anything by partially-reading the request's InputStream
or anything like that. Processing multipart requests is non-trivial, and it's easy to break things.
If you want to be able to stream large files for processing (e.g. huge XML files that can be processed serially), then you are going to want to handle the multipart parsing yourself. That way, you don't need a huge amount of heap to buffer the file and you don't need to store the file on the disk before you start processing it. (If this is your use-case, I suggest using HTTP PUT or HTTP POST and not using multipart requests.)
(It's worth mentioning that base64 encoding is not even mentioned in any specification for multipart processing. A few folks have mentioned base64 here, but I've never seen a standard web client use base64 for uploading a file using multipart/form-data. HTTP handles binary uploads just fine, thanks.)
Upvotes: 2
Reputation: 7459
Tomcat stores Part
s in "X:\some\path\Tomcat 7.0\temp" (/some/path/apache-tomcat-7.0.x/temp) directory.
when a multipart request is parsed, if the size of a single part exceed a threshold, a temporary file is created for that part.
your servlet/jsp will be invoked when transfer of all parts has been completed.
when the request is destroyed all temporary files are deleted as well.
if you are interested in the multipart parse phase, take a look at apache commons-fileupload (specifically ServletFileUpload.parseRequest()
), tomcat is based on a variant of that
UPDATE
you can configure it as a java arg, ie in windows:
Upvotes: 9
Reputation: 1824
I think we should step back for a moment and give a thought on the web infrastructure. First of all the HTTP transmits text data, so binary information encoded in base 64 so that data won't get messed up. This ends up leading to large amouts of data and this gives birth to the multipart form, which breaks datum into parts of encoded text with special markers that allow the server to assembly everything together. But to use this data we have to decode it first, and to do that I have to use the multiple parts of the form.
[a break so we can breath]
Continuing, so the browser needs to send lots of datum (1GB as you mentioned in your example), this datum is encoded with base64 and then separated into pieces (the multipart form) with its markers, then the browser starts to send the pieces to the server, but the server only returns the HTTP RESPONSE once it has finished receiving and processing the HTTP REQUEST (or if a timeout occurs, which incurs in an error on the browser screen).
What can assume here is that Tomcat could (I didn't check the internals) start decoding each part of the multipart that has already arrieved (either from the temp file or from memory) passing the inputstream to the user, since the inputstrem reading is a blocking operation the server would wait for the next piece of data to pass to Tomcat, which in turn would pass it to the program that is processing the data.
Once all data has reached the server the program would prepare the response that Tomcat would return to the browser completing the HTTP Request-Response cycle and closing the connection (since HTTP is a connectionless protocol).
Hope it helps :)
Upvotes: 2
Reputation: 23014
The InputStream
will typically read from a temporary file which is created by the multipart framework during the request. The temp file is normally stored in the application server's temporary area - as specified by the servlet context attribute javax.servlet.context.tempdir
. In Tomcat this is somewhere beneath $CATALINA_HOME/work
. The file will be deleted once the request completes.
For small file sizes, the multipart framework may keep the whole upload in memory - in which case the InputStream
will be reading directly from memory.
If you're using Spring's CommonsMultipartResolver
then you can set the maximum upload size allowed in memory via the maxInMemorySize
property. If an upload is bigger than this, then it will be stored as a temp file on disk.
Upvotes: 4