Philip Lombardi
Philip Lombardi

Reputation: 1327

How to calculate SHA256 of a very large file in Go?

I have a very large file I need to compute the checksum for in Go (between 30 and 200GB). I've used the common sha256sum program to get a canonical hash of the data but while trying do the same thing in Go I have been unable to get the same hash.

I originally started with this function that does work exactly as it is supposed to:

func checksum(file string) (string, error) {
    f, err := os.Open(file)
    if err != nil {
        return "", err
    }

    defer func() {
        _ = f.Close()
    }()

    copyBuf := make([]byte, 1024 * 1024)

    h := sha256.New()
    if _, err := io.CopyBuffer(h, f, copyBuf); err != nil {
        return "", err
    }

    return hex.EncodeToString(h.Sum(nil)), nil
}

However, requirements changed and I need to do some processing on the buffer as it is read so I modified the code to something like this. However, now the hash is incorrect and I am not sure what I am doing wrong.

    f, err := os.Open("<large file>")
    if err != nil {
        panic(err)
    }

    defer func() {
        _ = f.Close()
    }()

    buf := make([]byte, 1024 * 1024)
    h := sha256.New()

    for {
        bytesRead, err := f.Read(buf)
        if err != nil {
            if err != io.EOF {
                panic(err)
            }

            fmt.Println("EOF")
            break
        }

        // do some other work with buf before adding it to the hasher
        // processBuffer(buf)

        fmt.Printf("bytes read: %d\n", bytesRead)
        h.Write(buf)
    }

    fmt.Printf("checksum: %s\n", hex.EncodeToString(h.Sum(nil)))

Anyone have any idea what I am doing wrong?

Upvotes: 2

Views: 4589

Answers (1)

Philip Lombardi
Philip Lombardi

Reputation: 1327

I figured it out. I need to truncate the buffer before writing:

h.Write(buf[:bytesRead]) instead of h.Write(buf)

Upvotes: 4

Related Questions