0lesya
0lesya

Reputation: 247

Windows batch file to list all duplicates (and the original file) in tree and sort them

I have to check a tree for duplicating files and write all of them to List.txt file. But my script seems to skip one of the file locations in each group. (For example, if there are 4 duplicating files, only 3 of them appear in the list.)

If I'm not mistaken, it's the location of the "previousFile" of the last comparison that is missing. How do I write it to the list, too?

Also, how can I group paths in the List.txt by the filename so that it looks something like this:

File fileNameA.txt :
C:\path1\fileNameA.txt
C:\path2\fileNameA.txt
C:\path3\fileNameA.txt

File fileNameB.txt :
C:\path1\fileNameB.txt
C:\path2\fileNameB.txt
C:\path3\fileNameB.txt
C:\path4\fileNameB.txt

File fileNameC.txt :
C:\path1\fileNameC.txt
C:\path2\fileNameC.txt

...

?

That's my script so far:

@echo off

setlocal disableDelayedExpansion

set root=%1

IF EXIST List.txt del /F List.txt

set "prevTest=none"
set "prevFile=none"

for /f "tokens=1-3 delims=:" %%A in (
  '"(for /r "%root%" %%F in (*) do @echo %%~zF:%%~fF:)|sort"'
) do (
  set "currentTest=%%A"
  set "currentFile=%%B:%%C"
  setlocal enableDelayedExpansion
  set "match="
  if !currentTest! equ !previousTest! fc /b "!previousFile!" "!currentFile!" >nul && set match=1
  if defined match (
    echo File "!currentFile!" >> List.txt
    endlocal
  ) else (
    endlocal
    set "previousTest=%%A"
    set "previousFile=%%B:%%C"
  )
)

Upvotes: 2

Views: 1323

Answers (1)

JosefZ
JosefZ

Reputation: 30153

You need to count matches and add echo previous filename to echo current one in case of the first match.

Note '"(for /r "%root%" %%F in (*) do @echo(%%~nxF?%%~zF?%%~fF?)|sort"' changes:

  • used ? (question mark) as a delimiter: reserved character by Naming Files, Paths, and Namespaces
  • added %%~nxF? prefix to sort output properly by file names even in my sloppy test folder structure, see sample output below.

This output shows than even cmd poisonous characters (like &, %, ! etc.) in file names are handled properly with DisableDelayedExpansion kept.

@ECHO OFF
SETLOCAL EnableExtensions DisableDelayedExpansion
set "root=%~1"
if not defined root set "root=%CD%"

set "previousTest="
set "previousFile="
set "previousName="
set "match=0"

for /f "tokens=1-3 delims=?" %%A in (
  '"(for /r "%root%" %%F in (*) do @echo(%%~nxF?%%~zF?%%~fF?x)|sort"'
) do (
    set "currentName=%%A"
    set "currentTest=%%B"
    set "currentFile=%%C"
    Call :CompareFiles
)
ENDLOCAL
goto :eof

:CompareFiles
  if /I "%currentName%" equ "%previousName%" ( set /A "match+=1" ) else ( set "match=0" )
  if %match% GEQ 1 (
      if %match% EQU 1 echo FILE "%previousFile%" %previousTest%
      echo      "%currentFile%" %currentTest%
  ) else (
      set "previousName=%currentName%"
      set "previousTest=%currentTest%"
      set "previousFile=%currentFile%"
  )
goto :eof

Above script lists all files of duplicated names regardless of their size and content. Sample output:

FILE "d:\bat\cliPars\cliParser.bat" 1078
     "d:\bat\files\cliparser.bat" 12303
     "d:\bat\Unusual Names\cliparser.bat" 12405
     "d:\bat\cliparser.bat" 335
FILE "d:\bat\Stack33721424\BÄaá^ cčD%OS%Ď%%OS%%(%1!)&°~%%G!^%~2.foo~bar.txt" 120
     "d:\bat\Unusual Names\BÄaá^ cčD%OS%Ď%%OS%%(%1!)&°~%%G!^%~2.foo~bar.txt" 120

To list all files of duplicated names with the same size but regardless of their content:

:CompareFiles
  REM if /I "%currentName%" equ "%previousName%" (
  if /I "%currentTest%%currentName%" equ "%previousTest%%previousName%" (
      set /A "match+=1"
      REM fc /b "%previousFile%" "%currentFile%" >nul && set /A "match+=1"
  ) else ( set "match=0" )

To list all files of duplicated names with the same size and binary content:

:CompareFiles
  REM if /I "%currentName%" equ "%previousName%" (
  if /I "%currentTest%%currentName%" equ "%previousTest%%previousName%" (
      REM set /A "match+=1"
      fc /b "%previousFile%" "%currentFile%" >nul && set /A "match+=1"
  ) else ( set "match=0" )

Edit If the name of the file doesn't matter (only its contents), you could apply next changes in FOR loop and in :CompareFiles subroutine:

@ECHO OFF
SETLOCAL EnableExtensions DisableDelayedExpansion
set "root=%~1"
if not defined root set "root=%CD%"

set "previousTest="
set "previousFile="
set "match=0"

for /f "tokens=1-2 delims=?" %%A in (
  '"(for /r "%root%" %%F in (*) do @echo(%%~zF?%%~fF?)|sort"'
) do (
    set "currentTest=%%A"
    set "currentFile=%%B"
                                     rem optional: skip all files of zero length
    if %%A GTR 0 Call :CompareFiles
)
ENDLOCAL
goto :eof

:CompareFiles
  if /I "%currentTest%" equ "%previousTest%" (
      fc /b "%previousFile%" "%currentFile%" >nul && set /A "match+=1"
  ) else ( set "match=0" )
  if %match% GEQ 1 (
      if %match% EQU 1 echo FILE "%previousFile%" %previousTest%
      echo      "%currentFile%" %currentTest%
  ) else (
      set "previousTest=%currentTest%"
      set "previousFile=%currentFile%"
  )
goto :eof

Upvotes: 2

Related Questions