user1495762
user1495762

Reputation: 1

really merge in .bat file

How do I merge two text files in a .bat file? Or at least how to read the next line/test end of file in a .bat file?

Is it possible to merge two text files using a .bat script? The idea is not to append, or concatenate but perform a merge operation based on the contents of each line. A simplified example would be to to produce a sorted file from two sorted files, like in the pseudo code (pseudo as I can’t seem to find a way to read the next line and test the end of the file read – outside a for loop)

:TOP
 Set /p  Line1 Read_Line (file1)
:set /p  Line2 Read_Line (file2)
:TEST
 IF EOF(file1) GOTO  FINISH2
 IF EOF (file2) GOTO FINISH1
 IF  %Line1%  < %Line2% 
        (echo %Line1% - not in 2 >> File3
        set  /p Line1 =Read_Line (file1)
        GOTO TEST)
ELSE IF %Line1%  > %Line2% 
        (echo %Line2% - not in 1>> File3
        set  /p Line2=Read_Line (file2)
        GOTO TEST)
ELSE echo %Line1% in both >> File3
GOTO TOP
:FINISH1
echo %Line2% - not in 1>> File3
        set /p Line1=Read_Line (file1)
        IF NOT (EOF (File1)) 
                (echo %Line1% - not in 2 >> File3
                 GOTO FINISH1)
ELSE GOTO EOF
:FINISH2
           echo %Line2% - not in 1>> File3
        set /p Line2 =Read_Line (file2)
        IF NOT (EOF (File1) )
                (echo %Line2% - not in 1 >> File3
                 GOTO FINISH2)

I tried with for loops, but branching inside loops seems to stop the loop. I tried various things (including a parallel .bat) to find a way to move the cursor inside the file using set and < but can’t find how to do it right.

Upvotes: 0

Views: 2362

Answers (2)

dbenham
dbenham

Reputation: 130819

Batch is really a terrible "language" to use for text processing. Nearly any other tool you can find would be better (easier to develop and faster to execute) than batch. I provide batch solutions because I enjoy the challenge, but I would always recommend some other language or tool over batch for text processing. That being said...

Assuming both source files have already been sorted.

@echo off
setlocal enableDelayedExpansion

::define the files
set "in1=file1.txt"
set "in2=file2.txt"
set "out=file3.txt"

::define some simple macros
set "eof1=^!ln1^! gtr ^!cnt1^!"
set "eof2=^!ln2^! gtr ^!cnt2^!"
set "read1=if ^!ln1^! leq ^!cnt1^! set "txt1=" & <&3 set /p "txt1=" & set /a ln1+=1"
set "read2=if ^!ln2^! leq ^!cnt2^! set "txt2=" & <&4 set /p "txt2=" & set /a ln2+=1"
set "write1=echo(^!txt1^! - not in 2"
set "write2=echo(^!txt2^! - not in 1"
set "writeBoth=echo(^!txt1^! - in both"

::count the number of lines in each file
for /f %%N in ('find /v /c "" ^<"%in1%"') do set "cnt1=%%N"
for /f %%N in ('find /v /c "" ^<"%in2%"') do set "cnt2=%%N"

::setup redirection in outer block and merge the files in a loop
::The max number of iterations assumes there is no overlap (cnt1+cnt2)
::Break out of the loop as soon as both files have reached EOF.
set /a ln1=0, ln2=0, cnt=cnt1+cnt2
4<"%in2%" 3<"%in1%" (
  %read1%
  %read2%
  for /l %%N in (1 1 %cnt%) do (
    if %eof1% (
        if %eof2% goto :break
        %write2%
        %read2%
    ) else if %eof2% (
        %write1%
        %read1%
    ) else if .!txt1! lss .!txt2! (
        %write1%
        %read1%
    ) else if .!txt2! lss .!txt1! (
        %write2%
        %read2%
    ) else (
        %writeBoth%
        %read1%
        %read2%
    )
  )
) >"%out%
:break

Use of SET /P to read the files has the following restrictions:

  • Lines from both files must be terminated by <carriage return><line feed> characters (Windows style). It will not work with lines terminated by <line feed> (Unix style).
  • Maximum 1021 bytes (characters) per line, not including line terminators
  • Trailing control characters will be stripped from each line.

EDIT

If you simply want to create a sorted merged document without duplicates, then I beieve the following is an optimized version of sean's approach. It is not nearly as elegant as his, but I believe it is much faster. It also allows each line to begin with any character by setting the EOL option to a <line feed>. Note that this solution strips all blank lines from the output (as does sean's). Additional code could be added to preserve a single blank line.

@echo off
setlocal disableDelayedExpansion
set lf=^


::above 2 blank lines required
copy /b file1.txt+file2.txt file3.txt >nul
set "old="
(
  for /f eol^=^%lf%%lf%^ delims^= %%A in ('sort file3.txt') do (
    set "new=.%%A"
    setlocal enableDelayedExpansion
    if "!old!" neq "!new!" echo(!new:~1!
    for /f "delims=" %%B in ("!new!") do (
      endlocal
      set "old=%%B"
    )
  )
)>file4.txt

Upvotes: 2

SeanC
SeanC

Reputation: 15923

2 steps (sorting not needed, as the find in step 2 checks the new file, and only writes something if the data is not found):

  1. merge the files:
    copy file1.txt+file2.txt file3.txt

  2. Remove duplicate lines (/i ignores case, omit if Fred and FRED are to be treated as different):

    @echo off
    for /f "tokens=* delims=" %%a in (file3.txt) do (
      find /i "%%a" file4.txt>>nul&&rem
      if errorlevel 1 echo %%a>>file4.txt
      ) 
    

resultant file is file4.txt

Upvotes: 1

Related Questions