User12
User12

Reputation: 148

Why is my sha256 checksum incompatible with aws glacier checksum response?

I have an archive file in ubuntu server. I uploaded this file in AWS glacier using aws cli. at the finishing, AWS gave me a checksum like this:

{"checksum": "6c126443c882b8b0be912c91617a5765050d7c99dc43b9d30e47c42635ab02d5"}

but when i checked the checksum in own server like this:

sunny@server:~/sha256sum backup.zip

return this checksum:

5ba29292a350c4a8f194c78dd0ef537ec21ca075f1fe649ae6296c7100b25ba8

why between checksums has a difference?

Upvotes: 1

Views: 806

Answers (2)

drinkcat
drinkcat

Reputation: 1

In case somebody (like me), stumbles into this.

The answer from Anon is, I believe, incorrect, you should replace the second part of the code with this instead:

# Now calculate each level of the tree till one digest remains
# (Note: it's not actually a tree)
allchunks = b''
for chunk in chunks:
    allchunks += chunk
final = hashlib.sha256(allchunks).digest()

Also note that you need to do a get_object_attributes call to figure out the chunk size (if you use boto3, the default size is 8MB, not 1MB).

Upvotes: 0

Anon Coward
Anon Coward

Reputation: 10828

While the checksum returned by Glacier uses SHA-256, it is not a simple SHA-256 sum over the entire object. Rather, it calculates hashes for each megabyte of data, and calculates a hash for each pair of hashes, and repeats the process till one hash remains. For more information, see the documentation.

Here's is a simple implementation in Python

#!/usr/bin/env python3
import hashlib
import sys
import binascii

# Given a file object (opened in binary mode), calculate the checksum used by glacier
def calc_hash_tree(fileobj):
    chunk_size = 1048576

    # Calculate a list of hashes for each chunk in the fileobj
    chunks = []
    while True:
        chunk = f.read(chunk_size)
        if len(chunk) == 0:
            break
        chunks.append(hashlib.sha256(chunk).digest())
    
    # Now calculate each level of the tree till one digest remains
    while len(chunks) > 1:
        next_chunks = []
        while len(chunks) > 1:
            next_chunks.append(hashlib.sha256(chunks.pop(0) + chunks.pop(0)).digest())
        if len(chunks) > 0:
            next_chunks.append(chunks.pop(0))
        chunks = next_chunks

    # The final remaining hash is the root of the tree:
    return binascii.hexlify(chunks[0]).decode("utf-8")

if __name__ == "__main__":
    with open(sys.argv[1], "rb") as f:
        print(calc_hash_tree(f))

You can call it on a single file like this:

$ ./glacier_checksum.py backup.zip

Upvotes: 4

Related Questions