Reputation: 1327
I have a very large file I need to compute the checksum for in Go (between 30 and 200GB). I've used the common sha256sum
program to get a canonical hash of the data but while trying do the same thing in Go I have been unable to get the same hash.
I originally started with this function that does work exactly as it is supposed to:
func checksum(file string) (string, error) {
f, err := os.Open(file)
if err != nil {
return "", err
}
defer func() {
_ = f.Close()
}()
copyBuf := make([]byte, 1024 * 1024)
h := sha256.New()
if _, err := io.CopyBuffer(h, f, copyBuf); err != nil {
return "", err
}
return hex.EncodeToString(h.Sum(nil)), nil
}
However, requirements changed and I need to do some processing on the buffer as it is read so I modified the code to something like this. However, now the hash is incorrect and I am not sure what I am doing wrong.
f, err := os.Open("<large file>")
if err != nil {
panic(err)
}
defer func() {
_ = f.Close()
}()
buf := make([]byte, 1024 * 1024)
h := sha256.New()
for {
bytesRead, err := f.Read(buf)
if err != nil {
if err != io.EOF {
panic(err)
}
fmt.Println("EOF")
break
}
// do some other work with buf before adding it to the hasher
// processBuffer(buf)
fmt.Printf("bytes read: %d\n", bytesRead)
h.Write(buf)
}
fmt.Printf("checksum: %s\n", hex.EncodeToString(h.Sum(nil)))
Anyone have any idea what I am doing wrong?
Upvotes: 2
Views: 4589
Reputation: 1327
I figured it out. I need to truncate the buffer before writing:
h.Write(buf[:bytesRead])
instead of h.Write(buf)
Upvotes: 4