JavaSheriff
JavaSheriff

Reputation: 7687

batch script - to remove duplicate tokens in file

I have duplicate tokens in text file I would like to create new text file without the duplicate tokens (keeping the delimiters)

The delimiter is:~@^*^@~
example file:

aaa~@^*^@~bbb~@^*^@~aaa~@^*^@~bbb~@^*^@~aaa~@^*^@~bbb~@^*^@~aaa~@^*^@~bbb~@^*^@~aaa~@^*^@~bbb~@^*^@~xxx~@^*^@~bbb~@^*^@~aaa~@^*^@~bbb~@^*^@~aaa~@^*^@~bbb~@^*^@~aaa~@^*^@~bbb~@^*^@~aaa~@^*^@~bbb~@^*^@~aaa~@^*^@~bbb~@^*^@~aaa~@^*^@~bbb~@^*^@~aaa~@^*^@~bbb~@^*^@~aaa~@^*^@~bbb~@^*^@~aaa~@^*^@~bbb

Result should be:

aaa~@^*^@~bbb~@^*^@~xxx

I found script that remove duplicate lines:

==================================
@echo off > outfile
if %1'==' echo which file? && goto :eof
if not exist %1 echo %1 not found && goto :eof

for /f "tokens=* delims= " %%a in (%1) do (
find "%%a" < outfile > nul
    if errorlevel 1 echo %%a >> outfile
)

The script work nice for duplicate lines,
So i modified the delims from:

 "tokens=* delims="

to

"tokens=* delims=~@^*^@~"

But it wont work, What am i doing wrong? is one of the delimiter characters reserved word?
Thank you for any suggestion.

Upvotes: 0

Views: 1162

Answers (1)

dbenham
dbenham

Reputation: 130919

The FOR DELIMITERS option treats each character as a delimiter. You cannot use a sequence of characters as a delimiter, so it will not help in your case.

Windows batch is a marginal text processor for simple tasks. You have a particularly nasty problem for a Windows batch file. It might be doable, but the code would be complicated and slow at best.

I strongly advise you use some other tool better suited for text processing. I believe any of the following could be used:

Windows batch is probably about the worst choice you could make, especially for your problem. (this is coming from someone who really enjoys using batch)

Upvotes: 2

Related Questions