Reputation: 11
The idea is to get the urls where it is found with 404 error and the ids above them to indicate the urls belong to them and further to find the filename text and add to the output file.
I have been trying in to loop findSTR to get the find the line from the previously found line number. Can anybody help?
Sample file:
FileName: LastABC-1563220.xml
-------------------------------
123456786
12348
1234DEF
-------------------------------
http://Product.com/1234DEF
HTTP/1.1 404 Not Found - 0.062000
http://Product.com/1234DEF_1
HTTP/1.1 200 OK - 0.031000
123456785
12349
1234EFG
-------------------------------
http://Product.com/1234EFG
HTTP/1.1 200 OK - 0.031000
123456784
12340
1234FGH
-------------------------------
http://Product.com/1234FGH
HTTP/1.1 200 OK - 0.031000
http://Product.com/1234FGH_1
HTTP/1.1 404 Not Found - 0.079000
http://Product.com/1234FGH_2
HTTP/1.1 404 Not Found - 0.067000
http://Product.com/1234FGH_4
HTTP/1.1 404 Not Found - 0.047000
Desired output:
FileName: LastABC-1563220.xml
123456786 12348 1234DEF
http://Product.com/1234DEF
123456784 12340 1234FGH
http://Product.com/1234FGH_1
http://Product.com/1234FGH_2
http://Product.com/1234FGH_4
Script I have so far:
del "%FailingURLS%" 2>nul
set numbers=
for /F "delims=:" %%a in ('findstr /I /N /C:"404 Not Found" %Formatedfile%') do (
set /A before=%%a-1
set "numbers=!numbers!!before!: "
)
(for /F "tokens=1* delims=:" %%a in ('findstr /N "^" %Formatedfile% ^| findstr /B "%numbers%"') do echo %%b) > %FailingURLS%
Upvotes: 1
Views: 200
Reputation: 34909
Here is a script (let us call it extract-failed-urls.bat
) that demonstrates a possible way to fulfil your task — with quite some explanatory rem
remarks to help you to understand what happens:
@echo off
setlocal EnableExtensions DisableDelayedExpansion
rem // Define constants here:
set "_FILE=%~1" & rem // (`%~1` represents the first command line argument)
set "_URLP=://" & rem // (partial string that every listed URL contains)
set "_RESP=HTTP/1.1" & rem // (partial string that every response begins with)
set "_ERRN=404" & rem // (specific error number in response to recognise)
rem // Determine the total number of lines contained in the given file:
(for /F %%C in ('^< "%_FILE%" find /C /V ""') do set "CNT=%%C") || goto :EOF
rem // Read from the given file:
< "%_FILE%" (
rem // Clear IDs and URL buffers, and preset flag:
set "IDS=" & set "URL=" & set "FLAG=#"
setlocal EnableDelayedExpansion
rem // Read and write first line of file separately:
set /A "CNT-=1" & set "LINE=" & set /P LINE="" & < nul set /P ="!LINE!"
rem // Loop through the remaining lines:
for /L %%I in (1,1,!CNT!) do (
rem // Read a line and process only non-empty one:
set /P LINE="" && (
rem // Try to split off response prefix:
set "REST=!LINE:*%_RESP% =!"
rem // Determine kind of current line:
if "!LINE:-=!" == "" (
rem // Line contains only hyphens `-`, so clear URL buffer:
set "URL="
) else if not "!LINE!" == "!LINE:*%_URLP%=!" (
rem // Line contains an URL, so store to URL buffer, set flag:
set "URL=!LINE!" & set "FLAG=#"
) else if "!LINE!" == "%_RESP% !REST!" (
rem // Line contains a response, so gather number:
for /F %%R in ("!REST!") do (
rem /* Specific error encountered, hence write IDs, if any,
rem clear IDs buffer, then write stored URL, if any: */
if "%%R" == "%_ERRN%" (
if defined IDS echo/& echo(!IDS!
set "IDS=" & if defined URL echo(!URL!
)
)
rem // Clear URL buffer and set flag:
set "URL=" & set "FLAG=#"
) else (
rem /* No other condition fulfilled, hence line contains an ID,
rem so put ID into IDs buffer, clear URL buffer and flag: */
if defined FLAG (set "IDS=!LINE!") else set "IDS=!IDS! !LINE!"
set "URL=" & set "FLAG="
)
)
)
endlocal
)
endlocal
exit /B
To run it against an input file named sample.txt
use a command line like this:
extract-failed-urls.bat "sample.txt"
To write the output to another file named failed-urls.txt
use this:
extract-failed-urls.bat "sample.txt" > "failed-urls.txt"
With the data from the sample input file from the question, the output would be the following:
FileName: LastABC-1563220.xml 123456786 12348 1234DEF http://Product.com/1234DEF 123456784 12340 1234FGH http://Product.com/1234FGH_1 http://Product.com/1234FGH_2 http://Product.com/1234FGH_4
This approach distinguishes between the following different types of input lines, whose recognition trigger certain respective activities:
FileName:
):
-------------------------------
):
://
:
HTTP/1.1
+ SPACE:
404
:
Here is a simpler approach that relies on the fact that an ID block in the input file always contains three lines, then a hyphen-only line follows, and then URL and response pairs occur (if not, an error message appears):
@echo off
setlocal EnableExtensions DisableDelayedExpansion
rem // Define constants here:
set "_FILE=%~1" & rem // (`%~1` represents the first command line argument)
set "_URLP=://" & rem // (partial string that every listed URL contains)
set "_RESP=HTTP/1.1" & rem // (partial string that every response begins with)
set "_ERRN=404" & rem // (specific error number in response to recognise)
rem // Determine the total number of lines contained in the given file:
(for /F %%C in ('^< "%_FILE%" find /C /V ""') do set "CNT=%%C") || goto :EOF
rem // Read from the given file:
< "%_FILE%" (
rem // Clear IDs buffer and such for previous lines:
set "IDS=#" & set "PREV1=" & set "PREV2="
setlocal EnableDelayedExpansion
rem // Read and write first line of file separately:
set /A "CNT-=1" & set "LINE=" & set /P LINE="" & < nul set /P ="!LINE!"
rem // Read and check second line of file separately:
set /A "CNT-=1" & set "LINE=" & set /P LINE="" & if not "!LINE:-=!" == "" goto :ERROR
rem // Loop through the remaining lines:
set /A "CNT/=2" & for /L %%I in (1,1,!CNT!) do (
rem // Read a line and process only non-empty one:
set /P LINE1="" && (
rem // Read another line and process only non-empty one:
set /P LINE2="" && (
rem // Determine kind of first line:
if not "!LINE1!" == "!LINE1:*%_URLP%=!" (
rem // First line contains an URL, so next line must be a response;
rem hence try to split off response prefix: */
set "REST=!LINE2:*%_RESP% =!"
rem // Check second line whether it is really a response:
if "!LINE2!" == "%_RESP% !LINE2:*%_RESP% =!" (
rem // Line indeed contains a response, so gather number:
for /F %%R in ("!REST!") do (
rem /* Specific error encountered, hence write IDs, if any,
rem clear IDs buffer, then write URL from first line: */
if "%%R" == "%_ERRN%" (
if defined IDS echo/& echo(!IDS!
set "IDS=" & echo(!LINE1!
)
)
) else goto :ERROR
rem // Clear buffers for previous lines:
set "PREV1=" & set "PREV2="
) else (
rem /* First line does not contain an URL, so it contains an ID,
rem hence check if buffers for previous lines already contain
rem data, which must be IDs, so store them all in IDs buffer,
rem and check if the second line contains only hyphens `-`: */
if defined PREV1 if "!LINE2:-=!" == "" (
set "IDs=!PREV1! !PREV2! !LINE1!"
) else goto :ERROR
rem // Store both lines into buffer for previous lines:
set "PREV1=!LINE1!" & set "PREV2=!LINE2!"
)
) || exit /B 0
) || exit /B 0
)
endlocal
)
endlocal
exit /B
:ERROR
if defined IDS > con echo/
if "!" == "" endlocal
>&2 echo ERROR: expected file format violated!
exit /B 2
The calling convention as well as the output based on your input data are the same as above.
Upvotes: 0
Reputation: 67216
This is the way I would do it:
@echo off
setlocal EnableDelayedExpansion
del PreviousLines.txt 2>nul
set "ids="
(for /F "delims=" %%a in (test.txt) do (
set "line=%%a"
if "!line:~0,9!" equ "FileName:" (
echo(!line!>> PreviousLines.txt
) else if "!line:~0,5!" equ "http:" (
if defined ids echo(!ids!>> PreviousLines.txt
set "ids="
echo(!line!>> PreviousLines.txt
) else if "!line:~0,4!" equ "HTTP" (
rem It is an "OK" or "Not Found" line...
rem If is "Not Found", show previous lines
if "!line:Not Found=!" neq "!line!" type PreviousLines.txt
rem Anyway, reset previous lines
del PreviousLines.txt 2>nul
set "ids="
) else if "!line:~0,5!" neq "-----" (
set "ids=!ids!!line! "
)
)) > FailingURLS.txt
Output:
FileName: LastABC-1563220.xml
123456786 12348 1234DEF
http://Product.com/1234DEF
http://Product.com/1234FGH_1
http://Product.com/1234FGH_2
http://Product.com/1234FGH_4
I don't understand why you show the 123456784 12340 1234FGH
ids before the http://Product.com/1234FGH_1
because such an ids belongs to http://Product.com/1234FGH
that is OK...
Upvotes: 1
Reputation: 38613
Your question is too broad as it stands, so the following is an example to show a method of retrieving the '404' URL's from the file, which I assume to be your main issue.
@Echo Off
SetLocal EnableExtensions DisableDelayedExpansion
Set "Src=formattedfile.txt"
Set "Str=404 Not Found"
(Set LF=^
% 0x0A %
)
For /F %%A In ('Copy /Z "%~f0" Nul')Do Set "CR=%%A"
SetLocal EnableDelayedExpansion
FindStr /RC:".*!CR!*!LF!.*%Str%" "%Src%"
EndLocal
Pause
Just modify the value on line 3
to match the name of your formatted text file
Output from your provided file content:
http://Product.com/1234DEF
http://Product.com/1234FGH_1
http://Product.com/1234FGH_2
http://Product.com/1234FGH_4
Press any key to continue . . .
Upvotes: 0