Reputation: 1975
In Go1.13
, I have an upload server. This server accepts 2 types of upload.
Chunked and Chunked+Threaded. On chunked uploads everything works expected. I calculate every chunk while they are writing to disk. User can upload multiple chunks one-by-one in good order.
This means, I can save every chunk's SHA1
state to disk using BinaryMarshaler
, then read previous state and continue to calculate next chunks until I find final hash. Final hash gives me whole file's SHA1
perfectly.
When its ordered, I can append to existing state. But problem starts on threaded.... (Simultaneously)
hashComplete := sha256.New()
// read previous sttate from disk
state, err := ioutil.ReadFile(ctxPath)
if err != nil {
return err
}
if len(state) > 0 {
unmarshaler, _ := hashComplete.(encoding.BinaryUnmarshaler)
if err := unmarshaler.UnmarshalBinary(state); err != nil {
return err
}
}
// In here im writing file to disk and hash. file object is simple File.
writer := io.MultiWriter(file, hashComplete)
n, err := io.Copy(writer, src) // src is source (io.Reader)
marshaler, _ := hashComplete.(encoding.BinaryMarshaler)
newState, err := marshaler.MarshalBinary()
if err != nil {
return err
}
shaCtxFile.Write(newState) // Here im saving last state to disk.
// Then later, after upload finishes, I read this file and get the SHA1 hex from it. It is correct.
Now this is chunked upload in specific/good order. The other upload method is Chunked+Threaded. This mean, User can upload chunks simultaneously at the same time then send a request to concatenate them together in given order (at last request).
I already calculate each chunk's SHA1
and save it to disk.
My question is it is possible to combine those states and get the final hash or do I need to rehash after concatenate. Is there a way to combine those states?
Upvotes: 2
Views: 833
Reputation: 94058
Assuming you mean the final hash over the whole file, then no, you cannot combine multiple SHA-1 hashes over partial data to create a hash over the whole file, as if it was calculated at once. The reason for this is that the initial SHA-1 state is always the same, and rehashing will restart at that specific state. Furthermore, the final block is padded and a length is added (internal to the hash function) before the final hash value is calculated.
However, you can of course create a hash list or hash tree, where you define how big the blocks are. Then you can hash all the hashes over the chunk to create a topmost hash value. Now you have a different hash value than just the SHA-1 over the file, but the hash is consistent with your definition and can be recalculated, even in a multi-threaded fashion. It is still unique for the data within the file (assuming of course that put in the hash values sequentially) so it can be used to validate the integrity of the file. And, as far as I know, that's for normal secure hash function the only way to use multi-threaded hash calculation.
For more information, Google about Merkle-trees.
Of course, SHA-1 has been broken for collision resistance. Unfortunately, that's exactly what you are using it for. So please use SHA-256. If 256 bits is too much then using SHA-256 and taking the leftmost 160 bits is a more secure alternative.
Upvotes: 4