T-Diddy
T-Diddy

Reputation: 125

Batch file to extract text from file

I have a log file I am attempting to extract particular lines from. When I crop the file to a few lines above and below I am able to get it. However, there is multiple instances of what I am trying to find preventing using the FULL file.

Following is some code I have tried...

 for /f "tokens=1* delims=[]" %%a in ('find /n "    <Line Text="***********TEST1  TEST  TEST************" />" ^< TEST.LOG') do (set H=%%a
 )

 for /f "tokens=1* delims=[]" %%a in ('find /n "</Report>" ^< TEST.LOG') do (
 set T=%%a
 )

 for /f "tokens=1* delims=[]" %%a in ('find /n /v "" ^< TEST.LOG') do (
 if %%a GEQ !H! if %%a LEQ !T! echo.%%b
 )>> newfile.txt

I am hoping to get the following:

 <Line Text="***********TEST1  TEST  TEST************" />
 ~ALL LINES IN BETWEEN~
 </Report>

Upvotes: 1

Views: 2122

Answers (3)

Mofi
Mofi

Reputation: 49127

Windows command processor designed for executing commands and executables and not for text file processing is definitely the worst choice to filter TEST.LOG. For the reasons read completely my answer on How to read and print contents of text file line by line? The batch file code described there in detail was used as template for the batch file code below:

@echo off
setlocal EnableExtensions DisableDelayedExpansion
if not exist "Test.log" goto EndBatch
set "OutputLines="

(for /F delims^=^ eol^= %%I in ('%SystemRoot%\System32\findstr.exe /N "^" "Test.log"') do (
    set "Line=%%I"
    setlocal EnableDelayedExpansion
    if defined OutputLines (
        echo(!Line:*:=!
        if not "!Line:</Report>=!" == "!Line!" (
            endlocal & set "OutputLines="
        ) else endlocal
    ) else if not "!Line:<Line Text=!" == "!Line!" (
        echo(!Line:*:=!
        endlocal & set "OutputLines=1"
    ) else endlocal
))>"newfile.txt"

if exist "newfile.txt" for %%I in ("newfile.txt") do if %%~zI == 0 del "newfile.txt"

:EndBatch
endlocal

This batch file writes all lines from a line containing case-insensitive the string <Line Text to a line containing case-insensitive the string </Report> or end of file from Test.log into file newfile.txt.

Note: The search string between !Line: and = cannot contain an equal sign because of the equal sign is interpreted by Windows command processor as separator between search string, here </Report> and <Line Text, and the replace string, here twice an empty string. And an asterisk * at beginning of search string is interpreted by Windows command processor as order to replace everything from beginning of line to first occurrence of found string on doing the string substitution and not as character to find in the line. But this does not matter for this use case.

If the two lines marking beginning and end of block to extract are fixed and do not contain any variable part, the two string comparisons could be done without string substitution making it possible to compare also strings containing an equal sign.

@echo off
setlocal EnableExtensions DisableDelayedExpansion
if not exist "Test.log" goto EndBatch

set "BlockBegin= <Line Text="***********TEST1  TEST  TEST************" />"
set "BlockEnd= </Report>"
set "OutputLines="

(for /F delims^=^ eol^= %%I in ('%SystemRoot%\System32\findstr.exe /N "^" "Test.log"') do (
    set "Line=%%I"
    setlocal EnableDelayedExpansion
    if defined OutputLines (
        echo(!Line:*:=!
        if "!Line:*:=!" == "!BlockEnd!" (
            endlocal & set "OutputLines="
        ) else endlocal
    ) else if "!Line:*:=!" == "!BlockBegin!" (
        echo(!Line:*:=!
        endlocal & set "OutputLines=1"
    ) else endlocal
))>"newfile.txt"

if exist "newfile.txt" for %%I in ("newfile.txt") do if %%~zI == 0 del "newfile.txt"

:EndBatch
endlocal

This variant compares each entire line case-sensitive with the strings assigned to the environment variables BlockBegin and BlockEnd to determine on which line to start and on which line to stop the output of the lines.

For understanding the used commands and how they work, open a command prompt window, execute there the following commands, and read entirely all help pages displayed for each command very carefully.

  • del /?
  • echo /?
  • endlocal /?
  • findstr /?
  • for /?
  • goto /?
  • if /?
  • set /?
  • setlocal /?

See also:

Upvotes: 2

Ben Personick
Ben Personick

Reputation: 3264

Updated:

You want to find <Line Text="***********TEST1 TEST TEST************" /> then print it and any line until the First </Report> is encountered, then look for the next <Line Text="***********TEST1 TEST TEST************" /> and print it and every following line until the next </Report> for every time it occurs throughout?

– Ben Personick 1 hour ago

OR do you just want to take from the first <Line Text="***********TEST1 TEST TEST************" /> to the first </Report>?

– Ben Personick 1 hour ago

Find <Line Text="***********TEST1 TEST TEST************" /> then print it and any line until the First </Report> is encountered, then look fo rthe next <Line Text="***********TEST1 TEST TEST************" /> and print it and every followinng line until the next </Report> for every time it occurs. I feel there should ONLY be 1 sequence, however, at times this situation could very well be possible. Thanks for asking, very solid question!

– T-Diddy 1 hour ago

Okay, this should work the way you expect then, however, if there are a lot of unexpected characters it might make more sense to amend how the lines are being outputted to use SET instead of echo.

@(setlocal
  ECHO OFF
  SET "_LogFile=C:\Admin\TestLog.log"
  SET "_ResultFile=C:\Admin\TestLog.txt"
  SET "_MatchString_Begin=<Line Text="***********AAAAA BBBB CCCC************" />"
  SET "_MatchString_End=</Report>"
  SET "_Line#_Begin="
)

CALL :Main

( ENDLOCAL
  EXIT/B
)
:Main
  IF EXIST "%_ResultFile%" (
    DEL /F /Q "%_ResultFile%"
  )
  ECHO.&ECHO.== Processing ==&ECHO.
  FOR /F "Delims=[]" %%# IN ('
    Find /N "%_MatchString_Begin:"=""%" "%_LogFile%" ^| FIND "["
  ') DO (
    ECHO. Found Match On Line %%#
    SET /A "_Line#_Begin=%%#-1"
    CALL :Output
  )
  ECHO.&ECHO.== Completed ==&ECHO.&ECHO.Results to Screen will Start in 5 Seconds:
  timeout 5
  Type "%_ResultFile%"
GOTO :EOF

:Output
  FOR /F "SKIP=%_Line#_Begin% Tokens=* usebackq" %%_ IN (
    "%_LogFile%"
  ) DO (
    ECHO(%%_
    ECHO("%%_" | FIND /I "%_MatchString_End%" >NUL&&(
      GOTO :EOF
    )
  )>>"%_ResultFile%"
GOTO :EOF

Original Response Only Shows First Matched Content, based on this Comment:

This works great with my "cropped" file. However, in the ORIGINAL, ONLY unique line I have is <Line Text="***********AAAAA BBBB CCCC************" />. I can't seem to be able to use the full line as my batch just exits out, but am able to input "***********AAAAA BBBB CCCC************" and does not kick my batch out, however, exists elsewhere. Thus, requiring the other parameters as it is unique within the file. and I want the next following: in sequence. Otherwise this "</Report>" exists above in another section I don't want and believe is causing issue. – T-Diddy 3 mins ago

Okay, I thought so.

Try this:

@(setlocal
  ECHO OFF
  SET "_LogFile=C:\Admin\TestLog.log"
  SET "_MatchString_Begin=<Line Text="***********AAAAA BBBB CCCC************" />"
  SET "_MatchString_End=</Report>"
  SET "_Line#_Begin="
  SET "_Line#_End="
)
REM SET
FOR /F "Delims=[]" %%# IN ('
  Find /N "%_MatchString_Begin:"=""%" "%_LogFile%" ^| FIND "["
') DO (
  IF NOT DEFINED _Line#_Begin (
    SET /A "_Line#_Begin=%%#-1"
    ECHO.SET /A "_Line#_Begin=%%#-1"
  )
)
FOR /F "SKIP=%_Line#_Begin% Tokens=* usebackq" %%_ IN (
  "%_LogFile%"
) DO (
  IF NOT DEFINED _Line#_End (
    ECHO(%%_
    ECHO("%%_" | FIND /I "%_MatchString_End%" &&(
      SET "_Line#_End=1"
    )
  )
)
PAUSE

Upvotes: 2

Hackoo
Hackoo

Reputation: 18837

You can give a try with this code :

@echo off
Title Extract Data between two tags
Set "InputFile=InputFile.txt"
Set From_Start="<Line"
Set To_End="</Report>"
Set "OutputFile=OutputFile.txt"
Call :ExtractData %InputFile% %From_Start% %To_End%
Call :ExtractData %InputFile% %From_Start% %To_End%>%OutputFile%
If Exist %OutputFile% Start "" %OutputFile%
Exit
::'*************************************************************
:ExtractData <InputFile> <From_Start> <To_End>
(
echo Set fso = CreateObject^("Scripting.FileSystemObject"^)
echo Set f=fso.opentextfile^("%~1",1^)
echo Data = f.ReadAll
echo Data = Extract(Data,"(%~2.*\r\n)([\w\W]*)(\r\n)(%~3)"^)
echo WScript.StdOut.WriteLine Data
echo '************************************************
echo Function Extract(Data,Pattern^)
echo    Dim oRE,oMatches,Match,Line
echo    set oRE = New RegExp
echo    oRE.IgnoreCase = True
echo    oRE.Global = True
echo    oRE.Pattern = Pattern
echo    set Matches = oRE.Execute(Data^)
echo    If Matches.Count ^> 0 Then Data = Matches^(0^).SubMatches^(1^)
echo    Extract = Data
echo End Function
echo '************************************************
)>"%tmp%\%~n0.vbs"
cscript //nologo "%tmp%\%~n0.vbs"
If Exist "%tmp%\%~n0.vbs" Del "%tmp%\%~n0.vbs"
exit /b
::****************************************************

Upvotes: 0

Related Questions