guilhermecgs
guilhermecgs

Reputation: 3061

HTTPS protocol file integrity

I understand that when you send a file from a client to a server using HTTP/HTTPS protocols, you have the guarantee that all data sent successfully arrived at the destination. However, if you are sending a huge file and then suddenly the internet connection goes down, not all packages are sent and, therefore, you lose the logical integrity of the file.

Is there any point I am missing in my statement?

I would like to know if there is a way for the destination node to check file logical integrity without using a "custom code/api".

Upvotes: 5

Views: 4629

Answers (3)

Marcus Müller
Marcus Müller

Reputation: 36462

HTTPS is just HTTP over a TLS layer, so all applies to HTTPS, too:

HTTP is typically transported over TCP/IP (or, nowadays, might be transported over QUIC). Now, TCP (and QUIC) has flow control (ie. lost packets will be resent), and checksums (ie. the probability, that without the receiver noticing and re-requesting a packet data got altered is minor). So if you're really just transferring data, you're basically set (as long as your HTTP server is configured to send the length of your file in bytes, which, at least for static files, it usually is).

If your transfer is stopped before the whole file size that was advertised in the HTTP GET reply that your server sends to the client is reached, your client will know! Many HTTP libraries/clients can re-start HTTP transmissions (if the server supports it).

http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.15 even specifies a MD5 checksum header field. You can configure web servers to use that field, and clients might use it to verify the overall file integrity.

EDIT: Content-MD5 as specified by rfc2616 seems to be deprecated. You can now use a content digest, which is much more flexible.

Also, you mention that you want to check the file that a client sends to a server. That problem might be quite a bit harder -- whilst you're usually in total control of your web server, you can't force an arbitrary client (e.g. a browser) to hash its file before uploading.

If you're, on the other hand, in fact in control over the client's HTTP implementation, you could most probably also use something more file transfer oriented than plain HTTP -- think WebDav, AtomPUB etc, which are protocols atop of HTTP, or even more file exchange oriented protocols like rsync (which I'd heartily recommend if you're actually syncing stuff -- it reduces network usage to a minimum if both side's versions only differ partially). If for some reason you're in the position that your users share most of their data within a well-defined circle (for example, you're building something where photographers share their albums), you might even just use bittorrent, which has per-chunk hashing, extensive load balancing options, and allows for "plain old HTTP seeds".

Upvotes: 8

joozek
joozek

Reputation: 2211

There are several issues here:

  1. As Marcus stated is his answer TCP protects your bytes from being accidentaly corrupted, but it doesn't help if download was interrupted
  2. HTTPS additionally ensures that those bytes weren't tampered with between server and client (you)
  3. If you want to verify integrity of file (whose transfer was or was not interrupted) you should use checksum designed to protect from accidental file corruption (e.g. CRC32, there could be better ones, you should check)
    1. If in addition you use HTTPS then you're safe from intentional attacks too because you know your checksum is OK and that file parts you got weren't tampered with.
  4. If you use checksum, but don't use HTTPS (but you really should) then you should be safe against accidental data corruption but not against malicious attacks. It could be mitigated, but it's outside the scope of this question

Upvotes: 4

Julian Reschke
Julian Reschke

Reputation: 42045

In HTTP/1.1, the recipient can always detect whether it received a complete message (either by comparing the Content-Length, or by properly handling transfer-encoding: chunked).

(Adding content hashes can help if you suspect bit errors on the transport layer.)

Upvotes: 1

Related Questions