Reputation: 196

Detecting duplicate binaries in the same directory (Windows)

I have about 30 files in a directory varying from 64KB to 4MB that are BIN files. I need to find if there is duplicate files in there... Many files have the same size.

I would like to find if there are binary identical files in there.

Anyone know a way to do this? I'm under Windows XP Pro.

Thanks!

Upvotes: 1

Answers (6)

Joey

Reputation: 354714

That's pretty easy. You can use two nested for loops on the commandline:

for %x in (*) do @(
    for %y in (*) do @(
        if not "%x"=="%y" @(
            fc /b "%x" "%y" >nul && echo "%x" and "%y" are equal
        )
    )
)

If you want to use this in a batch file, you need to double the % signs.

The code simply loops twice over all files in the current directory:

for %x in (*) do @(
    for %y in (*) do @(

then, if the two file names aren't equal (because then we know the files are equal)

        if not "%x"=="%y" @(

if runs the fc utility which compares files

            fc "%x" "%y" >nul && echo "%x" and "%y" are equal

If fc had an exit code of 0 it means that the files were equal (thus duplicates) and in that case the echo after the && is triggered. && means “Just execute the following command if the previous one exited with a 0 exit code”.

And for 30 files this is certainly fast enough. I once implemented something more elaborate in batch, but this should suffice.

ETA: Found the other batch; still nowhere publicly explained but I once posted it at Super User.

Upvotes: 3

ghostdog74

Reputation: 342679

you can use fc or fciv (for checksum)

Or you could download GNU utilities

get Textutils which contains md5sum and coreutils, which contains sort /uniq. then do this

C:\files>md5sum * | sort | uniq -d -w 32
6f2b448730d23fe68876db87f1ddc143 *file.txt

To iterate and do something to the results, use a for loop

Upvotes: 0

goorj

Reputation: 443

Hash them with Md5Deep (or similar), or try a duplicate file checker,

http://www.portablefreeware.com/index.php?sc=77

Upvotes: 1

CraigTP

Reputation: 44939

Personally, I would sort the files by file size first. Files of different file size cannot the same from a binary comparison.

Those that are of the same file-size could potentially be the same, so I would then generate a hash of the files contents (either MD5, SHA1 etc.). Those files that have the same hash result are identical.

And to keep everything "on-topic" from a programming perspective (otherwise this question is perhaps more suited to superuser.com), here is a C# project that implements a "shell extension" (i.e. additional items in Windows Explorer's context menu) that will compute various hashes of files selected within Windows Explorer:

File Hash Generator Shell Extension

Upvotes: 1