Reputation: 4752
I'm using Typhoeus as an example, but the code can be in anything in Ruby. Assume there are 10000 urls that look like this:
http://example.com/somerandomstringwithoutextension
If I run the following code on a 5 GB video, it will crash the app since it will try to load the whole video into memory.
res = Typhoeus::Request.new(url, timeout: 15, followlocation: true).run
If I make HEAD requests on every single url first to determine it's content-type and content-size, it will help with the memory problem but it will take almost twice as much time (0.7 sec for head request and then 0.7 for the actual request)
Is there any way to make a http request in Ruby, watch it's currently transferred content size and drop it if it reaches a certain limit? E.g. drop requests if they are bigger than 5 MB? Alternatively, drop it based on it's content-type.
Upvotes: 1
Views: 125
Reputation: 769
It might be possible, but it's complicated.
According to HTTP/1.1 spec, there is actually a "partial GET".
The semantics of the GET method change to a "partial GET" if the request message includes a Range header field. A partial GET requests that only part of the entity be transferred, as described in section 14.35. The partial GET method is intended to reduce unnecessary network usage by allowing partially-retrieved entities to be completed without transferring data already held by the client.
You could specify the Range header field to fire a "partial GET", but it depends on if the server supports it. Also, I doubt if Typhoeus client supports partial GET, you may have to use Net::HTTP
to achieve that and I'm not sure if that's achievable either.
I would suggest you stick to the original plan: HEAD first, then GET, since that's 'HEAD' was designed for.
The HEAD method is identical to GET except that the server MUST NOT return a message-body in the response. The metainformation contained in the HTTP headers in response to a HEAD request SHOULD be identical to the information sent in response to a GET request. This method can be used for obtaining metainformation about the entity implied by the request without transferring the entity-body itself. This method is often used for testing hypertext links for validity, accessibility, and recent modification.
Upvotes: 2