hdcdigi
hdcdigi

Reputation: 87

Java MD5 Copy function generates different digest

I am experimenting with Java and created a small program that copies a file and generates a MD5 checksum. The program works and generates a checksum, but the resulting file that is copied does not match the original checksum.

I am new to Java and do not understand what the problem is here. Am I writing the wrong buffer to the output file?

package com.application;

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import java.math.BigInteger;
import java.security.MessageDigest;

public class Main {

    static int secure_copy(String src, String dest) throws Exception {
        InputStream inFile = new FileInputStream(src);
        OutputStream outFile = new FileOutputStream(dest);
        MessageDigest md = MessageDigest.getInstance("MD5");
        byte[] buf = new byte[1024];
        int numRead;

        do {
            numRead = inFile.read(buf);
            if (numRead > 0) {
                md.update(buf, 0, numRead);
                outFile.write(buf);
                outFile.flush();
            }
        } while (numRead != -1);

        inFile.close();
        outFile.close();

        BigInteger no = new BigInteger(1, md.digest());
        String result = no.toString(16);
        while(result.length() < 32) {
            result = "0" + result;
        }

        System.out.println("MD5: " + result);
        return 0;
    }

    public static void main(String[] args) {
        try {
            secure_copy(args[0], args[1]);
        } catch (Exception e) {
            System.out.println("Error: " + e.getMessage());
        }
    }
}

Output from source file: (Correct)

MD5: 503ea121d2bc6f1a2ede8eb47f0d13ef

The file from the copy function, checked via md5sum

md5sum file.mov 
56883109c28590c33fb31cc862619977  file.mov

Upvotes: 0

Views: 198

Answers (2)

Michael P
Michael P

Reputation: 153

On every read from the InputStream, the code is continually changing the data to calculate the hash of. Instead of calling md.update(buf, 0, numRead); within the loop, it should read the entire file into a byte[] and then call md.update(entireFileByeArray) once. (See this answer for a way to find the appropriate array size ahead of opening the file.)

Upvotes: 0

Joni
Joni

Reputation: 111219

You are writing the entire buffer to the output file, not just the portion that has data from the latest read. The fix is simple:

        if (numRead > 0) {
            md.update(buf, 0, numRead);
            outFile.write(buf, 0, numRead);
        }

Upvotes: 2

Related Questions