Jacqueline
Jacqueline

Reputation: 491

Loop through folder and merge files?

I want to loop through a folder that contains text files and marge them together.

When they are merger i want to remove duplicates and sort them.

How would i accomplish this with a batch script?

Upvotes: 0

Views: 2600

Answers (4)

lavinio
lavinio

Reputation: 24299

As long as the target filename doesn't match the wildcard spec, the easiest way to do this is like this:

copy /b file?.txt new_file.txt

The /b means copy in "binary" mode. Otherwise the default is /a which will stop copying any source file at a Ctrl+Z, and will append a Ctrl+Z at the end of the file.

If, as the comment below indicates, there is some fear that one or more files may not end properly with a CRLF, then an alternate solution is

(for %i in (file?.txt) do type %i)>new_file.txt

If the filenames have spaces or other odd characters, you might need to quote them, like this:

(for %i in (*.txt) do type "%i")>new_file.txt

But that is only part of the answer. To remove unique names, there are several solutions which use only batch files, or which use powershell, but the simplest would really be to grab the GnuWin32 sort utility. It can be gotten from SourceForge. Then the answer becomes simply:

(for %i in (*.txt) do type "%i")|sort -u|>new_file.txt

The side benefit of this is that the GNU sort is an extremely useful utility.

Upvotes: 2

dbenham
dbenham

Reputation: 130819

Lavinio's solution will not work properly if the last line in some files is not terminated by a line feed.

Here is a simple command (no batch needed) that will safely concatenate all the files even if last lines not terminated by line feed. Double up the percents if run from within a batch file.

>merged.tmp (for %F in (*.txt) do type "%F")

If you want to sort and remove duplicate lines then PA has a powershell solution. Here is a batch solution that sorts and removes duplicate lines. Note that SORT is case insensitive in batch, so the duplicate removal is also case insensitive.

@echo off
setlocal disableDelayedExpansion
>merged.tmp (for %%F in (*.txt) do type "%%F")
sort /rec 8192 merged.tmp /o merged.tmp2
>merged.txt (
  for /f delims^=^ eol^= %%A in (merged.tmp2) do (
    set "newLn=%%A"
    setlocal enableDelayedExpansion
    if /i "!newLn!" neq "!ln!" (
      endlocal
      set "ln=%%A"
      echo %%A
    ) else endlocal
  )
)
del merged.tmp merged.tmp2

Upvotes: 1

rene
rene

Reputation: 42414

Add this to your cmd file:

set cpy=
set filter=*.txt
set target=new_file.txt
del newfile.tmp
rem buildup concat
for %%a in (%filter%) do call :concat "%%a"

ren newfile.tmp %target%

rem REMOVE FILES! (carefull please!)
for %%a in (%filter%) do del /Q "%%a"
goto :done

:concat
if EXIST newfile.tmp ( copy newfile.tmp+%1 newfile.tmp ) else ( copy %1 newfile.tmp)
goto :EOF

:done
echo ready

Upvotes: 0

PA.
PA.

Reputation: 29339

Complementing lavinio answer, to honor OP requirement "remove duplicates and sort them", after concatenating the files, use powershell sort and get-unique commands

gc allfiles.txt | sort | get-unique > allfiles.txt 

Upvotes: 1

Related Questions