buaagg
buaagg

Reputation: 629

How to Calculate the MD5 and SHA checksum of a huge file?

I want to calculate the MD5 and SHA checksum of a series of huge files. Each file is about 1GB, so I wish to be as fast as possible.

Could anyone help to recommend some efficient C++ library?

BTW,

When reading file, fread( buffer, sizeof(char), BUFFER_SIZE, fin ), what size of BUFFER_SIZE is reasonable?

Upvotes: 1

Views: 7413

Answers (3)

krisku
krisku

Reputation: 3991

On top of my head I do not know any fast C++ library. Computing a hash is relative straightforward, so any C library will be as easy to use (you can easily wrap it in a C++ class yourself). I found the following site where a guy implemented several hashing algorithms in x86 assembly and compared them to "official" C implementations of the same algorithms:

https://www.nayuki.io/page/fast-sha1-hash-implementation-in-x86-assembly
https://www.nayuki.io/page/fast-md5-hash-implementation-in-x86-assembly

Those implementations should be a good starting point and then you just have to make the file I/O as efficient as possible. Memory-mapped I/O is usually very efficient, or then you could go complex and use two threads: one thread reads chunks from the file and the other thread hashes the read data. The idea here would be to always keep the process doing something useful, i.e. hashes can be calculated while waiting for more data to be read from the file.

Upvotes: 2

Mats Petersson
Mats Petersson

Reputation: 129524

I personally would do FILE *pipe = popen("md5sum filename"); [or something to that effect] - it is likely to be as fast as anything else, since 1GB of a file will take a little while to read, and the calculation is unlikely to be using much of your CPU time - most of the time will be waiting for the disk to load up the file.

On my system, I created 6 files of 1GB each, and it takes 2 seconds to checksum the file with md5sum. (12 seconds for all 6 files).

Upvotes: 2

MKAROL
MKAROL

Reputation: 316

You could use Openssl. Search for Mysticial answer about MD5 large file How to create a md5 hash of a string in C? When you look into Openssl SHA docs you will see that MD5 and SHA ways of using these functions are the same. SHA Openssl Docs

Upvotes: 2

Related Questions