RolfBly
RolfBly

Reputation: 3862

Remove duplicates from comma separated list in batch file

I have a batch file that (among other things) turns a list like this:

'foo_ph1-1.tif', 'foo_ph2-1', 'foo_ph2-2'

into a list like this, in a local variable called INVNOS:

'fooph1', 'fooph2', 'fooph2'

I want to remove the duplicates from the second list. I've been trying to do this when I create the list, from the answers to this question, to no avail.

Here's how I make the list.

@echo off
setlocal ENABLEDELAYEDEXPANSION
for %%f in ("*.tif") do @echo %%~nf>>list.lst
set FNAMES=
set INVNOS=
for /f %%i in ('type list.lst') do (
    set FNAMES=!FNAMES!'%%i.jpg', 
    for /f "tokens=1 delims=-" %%a in ("%%i") do (
        set BEFORE_HYPHEN=%%a
        set INVNOS=!INVNOS!'!BEFORE_HYPHEN:_=!', 
    ) 
)
set "FNAMES=%FNAMES:~0,-2%" 
set "INVNOS=%INVNOS:~0,-2%"
echo %INVNOS%
endlocal

Solutions with findstr won't work because I need to initialize INVNOS with an empty string, and I get stuck with the difference between % and '!', and slicing, inside the for loop.

I know this is easy in Python, however I'd like to do it with what's native (Windows 10/Windows Server), so CMD or Powershell.

Any suggestions?

Just to sketch the bigger picture, INVNOS (inventory numbers) is derived from directories full of tif's, so we can check whether or not they exist in some sql database.

Upvotes: 1

Views: 695

Answers (3)

lit
lit

Reputation: 16236

If you wanted to step up to PowerShell, something like this could be done in a .bat file script. Of course, It would be easier to write and maintain if it were all written in PowerShell.

=== doit.bat

@ECHO OFF
FOR /F "delims=" %%A IN ('powershell -NoLogo -NoProfile -Command ^
    "(Get-ChildItem -File -Filter '*.tif' |" ^
        "ForEach-Object { '''' + $($_.Name.Split('-')[0].Replace('_','')) + '''' } |" ^
        "Sort-Object -Unique) -join ','"') DO (
    SET "INVNOS=%%~A"
)
ECHO INVNOS is set to %INVNOS%
EXIT /B

Get-ChildItem produces a list of all the *.tif files in the directory. Split() does what "delims=-" does in a FOR loop. The [0] subscript chooses everything up to the first '-' character in the file name. Replace will remove the '_' characters. Sort-Object removed duplicates to produce a unique list. The -join converts the list of names to a single, comma delimited string. The resulting string is stored into the INVNOS variable.

Do you really want APOSTROPHE characters around each name in the list?

Upvotes: 0

Magoo
Magoo

Reputation: 79982

@ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION
:: The values assigned to these variables suit my system and test environment
SET "sourcedir=u:\your files"
SET "tempfile=%temp%\tempfile.txt"

:: remove variables starting :
FOR  /F "delims==" %%a In ('set : 2^>Nul') DO SET "%%a="
(for %%f in ("%sourcedir%\*.tif") do echo %%~nf)>"%tempfile%"
set "FNAMES="
set "INVNOS="
for /f "usebackqdelims=" %%i in ("%tempfile%") do (
    set FNAMES=!FNAMES!'%%i.jpg', 
    for /f "tokens=1 delims=-" %%a in ("%%i") do (
        set "BEFORE_HYPHEN=%%a"
        SET "before_hyphen=!BEFORE_HYPHEN:_=!"
        IF NOT DEFINED :!BEFORE_HYPHEN! set "INVNOS=!INVNOS!'!BEFORE_HYPHEN:_=!', "&SET ":!BEFORE_HYPHEN!=Y"
    ) 
)
set "FNAMES=%FNAMES:~0,-2%" 
set "INVNOS=%INVNOS:~0,-2%"
echo %INVNOS%

IF DEFINED tempfile DEL "%tempfile%"
GOTO :EOF

You would need to change the value assigned to sourcedir to suit your circumstances. The listing uses a setting that suits my system.

I deliberately include spaces in names to ensure that the spaces are processed correctly.

%tempfile% is used temporarily and is a filename of your choosing.

The usebackq option is only required because I chose to add quotes around the source filename.

it is standard practice on SO to use the syntax set "var=value" for string assignments as this ensures stray trailing spaces on the line are ignored.

Evil trailing space on OP's code set INVNOS... within the for ... %%a loop.

Given OP's original filename list, foo_ph1-1.tif foo_ph2_1 foo_ph2-2, the processing should produce fooph1 fooph21 fooph2, not fooph1 fooph2 fooph2 as claimed.

My testing included foo_ph2-2.tif

The code is essentially the same, but first clearing any environment variables that start :, on the Irish principle.

The temporary file nominated is recreated avoiding the (unfulfilled) requirement to first delete it.

BEFORE_HYPHEN is explicitly expunged of underscores before the if not defined test is applied. I selected : because : can't be part of a filename. Once the name is applied to the invnos list, the :!BEFORE_HYPHEN! variable is established to prevent further accumulation of repeat BEFORE_HYPHEN values into invnos.

Upvotes: 0

Stephan
Stephan

Reputation: 56155

I would approach the problem differently:

@echo off
setlocal ENABLEDELAYEDEXPANSION
for %%f in (*.tif) do (
  for /f "delims=-" %%g in ("%%~nf") do set "~%%g=."
)
for /f "delims=~=" %%a in ('set ~') do set "INVOS='%%a', !INVOS!"
set "INVOS=%INVOS:~0,-2%
echo %INVOS:_=%

The trick is to define variables for each filename (the variableNAMES contain the filenames. A variable can only exist once, so per definition, there are no duplicates)

With another for loop extract the names from the defined variables and join them. The underscores can be deleted in one go instead of removing them from each substring.

When needed, you can delete the variables with for /f "delims==" %%a in ('set ~') do set "%%a=", but they are destroyed anyway when the script ends. (same line when you want to be sure, no variable starting with ~ is defined by accident before you set them)

Upvotes: 1

Related Questions