Reputation: 1
How do I merge two text files in a .bat file? Or at least how to read the next line/test end of file in a .bat file?
Is it possible to merge two text files using a .bat script? The idea is not to append, or concatenate but perform a merge operation based on the contents of each line. A simplified example would be to to produce a sorted file from two sorted files, like in the pseudo code (pseudo as I can’t seem to find a way to read the next line and test the end of the file read – outside a for loop)
:TOP
Set /p Line1 Read_Line (file1)
:set /p Line2 Read_Line (file2)
:TEST
IF EOF(file1) GOTO FINISH2
IF EOF (file2) GOTO FINISH1
IF %Line1% < %Line2%
(echo %Line1% - not in 2 >> File3
set /p Line1 =Read_Line (file1)
GOTO TEST)
ELSE IF %Line1% > %Line2%
(echo %Line2% - not in 1>> File3
set /p Line2=Read_Line (file2)
GOTO TEST)
ELSE echo %Line1% in both >> File3
GOTO TOP
:FINISH1
echo %Line2% - not in 1>> File3
set /p Line1=Read_Line (file1)
IF NOT (EOF (File1))
(echo %Line1% - not in 2 >> File3
GOTO FINISH1)
ELSE GOTO EOF
:FINISH2
echo %Line2% - not in 1>> File3
set /p Line2 =Read_Line (file2)
IF NOT (EOF (File1) )
(echo %Line2% - not in 1 >> File3
GOTO FINISH2)
I tried with for loops, but branching inside loops seems to stop the loop. I tried various things (including a parallel .bat) to find a way to move the cursor inside the file using set
and <
but can’t find how to do it right.
Upvotes: 0
Views: 2362
Reputation: 130819
Batch is really a terrible "language" to use for text processing. Nearly any other tool you can find would be better (easier to develop and faster to execute) than batch. I provide batch solutions because I enjoy the challenge, but I would always recommend some other language or tool over batch for text processing. That being said...
Assuming both source files have already been sorted.
@echo off
setlocal enableDelayedExpansion
::define the files
set "in1=file1.txt"
set "in2=file2.txt"
set "out=file3.txt"
::define some simple macros
set "eof1=^!ln1^! gtr ^!cnt1^!"
set "eof2=^!ln2^! gtr ^!cnt2^!"
set "read1=if ^!ln1^! leq ^!cnt1^! set "txt1=" & <&3 set /p "txt1=" & set /a ln1+=1"
set "read2=if ^!ln2^! leq ^!cnt2^! set "txt2=" & <&4 set /p "txt2=" & set /a ln2+=1"
set "write1=echo(^!txt1^! - not in 2"
set "write2=echo(^!txt2^! - not in 1"
set "writeBoth=echo(^!txt1^! - in both"
::count the number of lines in each file
for /f %%N in ('find /v /c "" ^<"%in1%"') do set "cnt1=%%N"
for /f %%N in ('find /v /c "" ^<"%in2%"') do set "cnt2=%%N"
::setup redirection in outer block and merge the files in a loop
::The max number of iterations assumes there is no overlap (cnt1+cnt2)
::Break out of the loop as soon as both files have reached EOF.
set /a ln1=0, ln2=0, cnt=cnt1+cnt2
4<"%in2%" 3<"%in1%" (
%read1%
%read2%
for /l %%N in (1 1 %cnt%) do (
if %eof1% (
if %eof2% goto :break
%write2%
%read2%
) else if %eof2% (
%write1%
%read1%
) else if .!txt1! lss .!txt2! (
%write1%
%read1%
) else if .!txt2! lss .!txt1! (
%write2%
%read2%
) else (
%writeBoth%
%read1%
%read2%
)
)
) >"%out%
:break
Use of SET /P to read the files has the following restrictions:
<carriage return><line feed>
characters (Windows style). It will not work with lines terminated by <line feed>
(Unix style).EDIT
If you simply want to create a sorted merged document without duplicates, then I beieve the following is an optimized version of sean's approach. It is not nearly as elegant as his, but I believe it is much faster. It also allows each line to begin with any character by setting the EOL option to a <line feed>
. Note that this solution strips all blank lines from the output (as does sean's). Additional code could be added to preserve a single blank line.
@echo off
setlocal disableDelayedExpansion
set lf=^
::above 2 blank lines required
copy /b file1.txt+file2.txt file3.txt >nul
set "old="
(
for /f eol^=^%lf%%lf%^ delims^= %%A in ('sort file3.txt') do (
set "new=.%%A"
setlocal enableDelayedExpansion
if "!old!" neq "!new!" echo(!new:~1!
for /f "delims=" %%B in ("!new!") do (
endlocal
set "old=%%B"
)
)
)>file4.txt
Upvotes: 2
Reputation: 15923
2 steps (sorting not needed, as the find
in step 2 checks the new file, and only writes something if the data is not found):
merge the files:
copy file1.txt+file2.txt file3.txt
Remove duplicate lines (/i
ignores case, omit if Fred
and FRED
are to be treated as different):
@echo off
for /f "tokens=* delims=" %%a in (file3.txt) do (
find /i "%%a" file4.txt>>nul&&rem
if errorlevel 1 echo %%a>>file4.txt
)
resultant file is file4.txt
Upvotes: 1