user3375545
user3375545

Reputation: 3

Remove text between delims based on multiple criteria using batch

I am trying to remove entries from a vertical report that looks like this.

report start : hi good morning
report (1234) hi
10/10/2013
line unequal
good morning hi good morning (123:)
20131212020202312312********
report start : hi good evening
report (1234) hi
10/10/2013
good evening hi good evening (123:)
20131212020202312312********
report start : hi good morning
report (1234) hi
10/10/2013
good evening hi good evening (123:)
20131212020202312312********

I am trying to remove complete entries where "evening" is present and "morning" is not. In short, the report should end up like this:

report start : hi good morning
report (1234) hi
10/10/2013
line unequal
good morning hi good morning (123:)
20131212020202312312********
report start : hi good morning
report (1234) hi
10/10/2013
good evening hi good evening (123:)
20131212020202312312********

I had though about concatenating everything between "**", where each line would end with the series of asterisks. They are always the same length. Then use findstr to remove entries, but how do I reconstruct the entire report? It must return to a vertical format. To add to complexity, the results are in various indentations in the txt file.

I have been unable to use "*" as a delim, and therefore, cannot introduce a for /f loop to concatenate. This is how far I've gotten.

Thanks

Upvotes: 0

Views: 124

Answers (4)

MC ND
MC ND

Reputation: 70923

One more. In this case using intermediate temporary files.

@echo off
    setlocal enableextensions disabledelayedexpansion

    :: configure and clean ouput/temporary files
    set "inputFile=inputFile.txt"
    set "outputFile=outputFile.txt"
    set "tempFile=%temp%\%~nx0.tmp"
    break>"%tempFile%"
    break>"%outputFile%"

    :: retrieve end of section lines
    for /f "tokens=1 delims=:" %%a in ('findstr /n /l /e /c:"****" "%inputFile%"') do set "_sect.%%a=1"

    :: extract each section and test for inclusion in output file
    for /f "tokens=1,* delims=:" %%a in ('findstr /n "^" "%inputFile%"') do (
        echo(%%b>>"%tempFile%"
        if defined _sect.%%a (
            find /i "morning" "%tempFile%" >nul && ( type "%tempFile%">>"%outputFile%" ) 
            break>"%tempFile%"
        )
    )

    :: clean and exit
    del /q "%tempFile%" 2>nul
    endlocal

Upvotes: 0

dbenham
dbenham

Reputation: 130819

Regular expressions can be your friend :) A tool like awk or sed could work well - free Windows ports are available.

I have written REPL.BAT - a hybrid JScript/batch utility that performs a regex search and replace on stdin and writes the results to stdout. It is pure script that runs natively on any Windows machine from XP onward. Full documentation is embedded within the script.

Assuming REPL.BAT is in your current directory, or better yet, somewhere within your PATH, then all you need is the following:

type source.txt|repl "^report start :(?:[\s\S](?!morning))*?evening(?:[\s\S](?!morning))*?^\d*\*{8}\r?\n" "" m >output.txt

The above uses the M option to enable searches across multiple lines, which requires loading the entire source file in memory. That might become problematic with really large input files. But this is still better than a pure batch solution using FOR /F, since that command also buffers the entire source file in memory.

Upvotes: 1

Aacini
Aacini

Reputation: 67216

@echo off
setlocal EnableDelayedExpansion

set i=0
set "morning="
set "evening="
for /F "delims=" %%a in (test.txt) do (
   set /A i+=1
   set "line[!i!]=%%a"
   set "line=%%a"
   if "!line:morning=!" neq "%%a" set morning=present
   if "!line:evening=!" neq "%%a" set evening=present
   if "!line:~-4!" equ "****" (
      set "remove="
      if defined evening if not defined morning set remove=true
      if not defined remove for /L %%i in (1,1,!i!) do echo !line[%%i]!
      set i=0
      set "morning="
      set "evening="
   )
)

Upvotes: 0

Magoo
Magoo

Reputation: 80023

@ECHO OFF
SETLOCAL
:: make a tempfile
:maketemp
SET "tempfile=%temp%\%random%"
IF EXIST "%tempfile%*" (GOTO maketemp) ELSE (ECHO.>"%tempfile%a")
:: Process file, count sections and record section numbers to remove
SET /a section=0
CALL :init
FOR /f "delims=" %%a IN (q22151608.txt) DO (
ECHO %%a|FINDSTR "evening" >NUL
 IF NOT ERRORLEVEL 1 SET found1=Y
ECHO %%a|FINDSTR "morning" >NUL
 IF NOT ERRORLEVEL 1 SET found2=Y
 ECHO %%a|FINDSTR /e "********" >NUL
 IF NOT ERRORLEVEL 1 CALL :endsection
)
:: Re-process file, count sections
SET /a section=0
CALL :init
(
FOR /f "delims=" %%a IN (q22151608.txt) DO (
 IF NOT DEFINED found1 CALL :switch
 IF DEFINED found2 ECHO(%%a
 ECHO %%a|FINDSTR /e "********" >NUL
 IF NOT ERRORLEVEL 1 CALL :init
)
)>newfile.txt
DEL "%tempfile%a"

GOTO :EOF

:switch
SET found1=Y
FIND "#%section%#" "%tempfile%a" >NUL
IF ERRORLEVEL 1 SET found2=Y
GOTO :eof

:endsection
IF DEFINED found1 IF NOT DEFINED found2 >>"%tempfile%a" ECHO(#%section%#
:init
SET "found1="
SET "found2="
SET /a section+=1
GOTO :eof

I used a file named q22151608.txt containing your data for my testing. Output is to file newfile.txt

Your output description does not fit with your problem definition. the line unequal line should not appear if I've interpreted your description correctly.

It is preferable to post real data suitably censored rather than artificial data. It's not clear where a section starts and ends. Even something as simple as changing the report number of timestamp would make the supplied data clearer.

Upvotes: 1

Related Questions