discodowney
discodowney

Reputation: 1507

Comparing two files in C++

I have a function that compares two files to see if they are the same. It reads the files byte by byte and checks to see they are the same.
The problem I'm having now is that for big files this function takes quite a long time.

What is the better, faster way to check if files are the same?

Upvotes: 4

Views: 5770

Answers (5)

ttokic
ttokic

Reputation: 116

If you are not familiar with hashing search on google about "MD5" or "SHA" algorithms. Hashing is one of the efficient approaches to check consistence of files. Only you need is to find implementation of one of this algorithms and check them; for example:

if(md5(file1Path) == md5(file2Path))
    cout<<"Files are equal"<<endl;
else
    cout<<"Files are not equal"<<endl;

Upvotes: -2

justin
justin

Reputation: 104698

If you really want brute force comparison of two files, mmaping may help.

If you know the file structure of what you are reading, read unique sections which allow you to identify them quickly (e.g. a header and relevant chunks/sections). Of course, you will want to get its basic attributes before comparing.

Generate hashes (or something) if you do multiple comparisons.

Upvotes: 2

Phil Hannent
Phil Hannent

Reputation: 12317

Whilst there are a number of examples of cryptographic hash functions using SHA or MD5, for file comparisons its better to use a non-cryptographic hash as it will be faster:

https://en.wikipedia.org/wiki/List_of_hash_functions#Non-cryptographic_hash_functions

The FNV hash is considered fast for your needs:

https://en.wikipedia.org/wiki/Fowler_Noll_Vo_hash

Upvotes: 0

mah
mah

Reputation: 39807

When your files are not the same, are they likely to be of the same size? If not, you can determine the file sizes right away (fseek to the end, ftell to determine the position), and if they're different then you know they're not the same without comparing the data. If the size is the same, remember to fseek back to the beginning.

If you read your files into large buffers of memory and compare each buffer using memcmp() you will improve performance. You don't have to read the entire file at once, just set a large buffer size and read blocks of that size from each file, for each comparison iteration through your loop. The memcpy function will operate on 32 bit values, rather than 8 bit bytes.

Upvotes: 7

foxx1337
foxx1337

Reputation: 2026

Read the files in chunks of size X. With X up to 1-10-50 megabytes. Use memcmp() on those chunks.

Upvotes: 0

Related Questions