Reputation: 491
I want to loop through a folder that contains text files and marge them together.
When they are merger i want to remove duplicates and sort them.
How would i accomplish this with a batch script?
Upvotes: 0
Views: 2600
Reputation: 24299
As long as the target filename doesn't match the wildcard spec, the easiest way to do this is like this:
copy /b file?.txt new_file.txt
The /b
means copy in "binary" mode. Otherwise the default is /a
which will stop copying any source file at a Ctrl+Z
, and will append a Ctrl+Z
at the end of the file.
If, as the comment below indicates, there is some fear that one or more files may not end properly with a CRLF, then an alternate solution is
(for %i in (file?.txt) do type %i)>new_file.txt
If the filenames have spaces or other odd characters, you might need to quote them, like this:
(for %i in (*.txt) do type "%i")>new_file.txt
But that is only part of the answer. To remove unique names, there are several solutions which use only batch files, or which use powershell, but the simplest would really be to grab the GnuWin32 sort utility. It can be gotten from SourceForge. Then the answer becomes simply:
(for %i in (*.txt) do type "%i")|sort -u|>new_file.txt
The side benefit of this is that the GNU sort is an extremely useful utility.
Upvotes: 2
Reputation: 130819
Lavinio's solution will not work properly if the last line in some files is not terminated by a line feed.
Here is a simple command (no batch needed) that will safely concatenate all the files even if last lines not terminated by line feed. Double up the percents if run from within a batch file.
>merged.tmp (for %F in (*.txt) do type "%F")
If you want to sort and remove duplicate lines then PA has a powershell solution. Here is a batch solution that sorts and removes duplicate lines. Note that SORT is case insensitive in batch, so the duplicate removal is also case insensitive.
@echo off
setlocal disableDelayedExpansion
>merged.tmp (for %%F in (*.txt) do type "%%F")
sort /rec 8192 merged.tmp /o merged.tmp2
>merged.txt (
for /f delims^=^ eol^= %%A in (merged.tmp2) do (
set "newLn=%%A"
setlocal enableDelayedExpansion
if /i "!newLn!" neq "!ln!" (
endlocal
set "ln=%%A"
echo %%A
) else endlocal
)
)
del merged.tmp merged.tmp2
Upvotes: 1
Reputation: 42414
Add this to your cmd file:
set cpy=
set filter=*.txt
set target=new_file.txt
del newfile.tmp
rem buildup concat
for %%a in (%filter%) do call :concat "%%a"
ren newfile.tmp %target%
rem REMOVE FILES! (carefull please!)
for %%a in (%filter%) do del /Q "%%a"
goto :done
:concat
if EXIST newfile.tmp ( copy newfile.tmp+%1 newfile.tmp ) else ( copy %1 newfile.tmp)
goto :EOF
:done
echo ready
Upvotes: 0
Reputation: 29339
Complementing lavinio answer, to honor OP requirement "remove duplicates and sort them", after concatenating the files, use powershell sort and get-unique commands
gc allfiles.txt | sort | get-unique > allfiles.txt
Upvotes: 1