Reputation: 196
I have about 30 files in a directory varying from 64KB to 4MB that are BIN files. I need to find if there is duplicate files in there... Many files have the same size.
I would like to find if there are binary identical files in there.
Anyone know a way to do this? I'm under Windows XP Pro.
Thanks!
Upvotes: 1
Views: 411
Reputation: 354714
That's pretty easy. You can use two nested for
loops on the commandline:
for %x in (*) do @(
for %y in (*) do @(
if not "%x"=="%y" @(
fc /b "%x" "%y" >nul && echo "%x" and "%y" are equal
)
)
)
If you want to use this in a batch file, you need to double the %
signs.
The code simply loops twice over all files in the current directory:
for %x in (*) do @(
for %y in (*) do @(
then, if the two file names aren't equal (because then we know the files are equal)
if not "%x"=="%y" @(
if runs the fc
utility which compares files
fc "%x" "%y" >nul && echo "%x" and "%y" are equal
If fc
had an exit code of 0
it means that the files were equal (thus duplicates) and in that case the echo
after the &&
is triggered. &&
means “Just execute the following command if the previous one exited with a 0
exit code”.
And for 30 files this is certainly fast enough. I once implemented something more elaborate in batch, but this should suffice.
ETA: Found the other batch; still nowhere publicly explained but I once posted it at Super User.
Upvotes: 3
Reputation: 342679
you can use fc or fciv (for checksum)
Or you could download GNU utilities
get Textutils which contains md5sum and coreutils, which contains sort /uniq. then do this
C:\files>md5sum * | sort | uniq -d -w 32
6f2b448730d23fe68876db87f1ddc143 *file.txt
To iterate and do something to the results, use a for loop
Upvotes: 0
Reputation: 443
Hash them with Md5Deep (or similar), or try a duplicate file checker,
http://www.portablefreeware.com/index.php?sc=77
Upvotes: 1
Reputation: 44939
Personally, I would sort the files by file size first. Files of different file size cannot the same from a binary comparison.
Those that are of the same file-size could potentially be the same, so I would then generate a hash of the files contents (either MD5, SHA1 etc.). Those files that have the same hash result are identical.
And to keep everything "on-topic" from a programming perspective (otherwise this question is perhaps more suited to superuser.com), here is a C# project that implements a "shell extension" (i.e. additional items in Windows Explorer's context menu) that will compute various hashes of files selected within Windows Explorer:
File Hash Generator Shell Extension
Upvotes: 1
Reputation: 7200
You don't specify, how this should happen. Maybe this is a question which belongs to superuser.com, but you may use a tool like WinMerge.
If you have to do this by code, you could calculate a hash value of the files and compare this hash value.
Upvotes: 0
Reputation: 19225
Generate a hash (Md5 or sha1) of each file and compare.
Obviously if two files are a different size then you can discount it immediately.
Upvotes: 0