Reputation: 145
@ECHO OFF
SET InFile=nsb.txt
SET OutFile=Output.txt
IF EXIST "%OutFile%" DEL "%OutFile%"
SET TempFile=Temp.txt
IF EXIST "%TempFile%" DEL "%TempFile%"
FOR /F "tokens=*" %%A IN ('FINDSTR /N "wordA" "%InFile%"') DO (
CALL :RemovePrecedingWordA "%%A"
FOR /F "tokens=1 delims=:" %%B IN ('ECHO.%%A') DO (
MORE +%%B "%InFile%"> "%TempFile%"
FINDSTR /V "wordB" "%TempFile%">> "%OutFile%"
FOR /F "tokens=*" %%C IN ('FINDSTR "wordB" "%InFile%"') DO (
CALL :RemoveWordB "%%C"
IF EXIST "%TempFile%" DEL "%TempFile%"
GOTO :eof
)
)
)
:RemoveWordB
REM Replace "wordB" with a character that we don't expect in text that we will then use as a delimiter (` in this case)
SET LastLine=%~1
SET LastLine=%LastLine:wordB=`%
FOR /F "tokens=1 delims=`" %%A IN ('ECHO.%LastLine%') DO ECHO.%%A>> "%OutFile%"
GOTO :eof
OK so I found this code and I see how it works, and what it does is what I am looking for, but I don't know how to make sure I'm editing this code to work for my case.
I was wondering if there was a way to extract data based on line code, since every start and end of my content have the same number of lines, if a code can be use to find lets say line 4000 to 4050, next content will start 4051 to 4101, and so on
Thank you for any ideas that will work.
Upvotes: 0
Views: 152
Reputation: 79982
@ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION
SET "sourcedir=U:\sourcedir"
SET "destdir=U:\destdir"
SET "filename1=q64237051.txt"
:: remove variables starting #
FOR /F "delims==" %%a In ('set # 2^>Nul') DO SET "%%a="
set /a #=0
:: construct full base-filename
set "fullname1=%sourcedir%\%filename1%"
:: find all "start of block" line numbers
for /f "delims=:" %%a in ('findstr /n /x "{" "%fullname1%"') do set /a "#x!#!=%%a"&set /a #+=1&set /a "#s!#!=%%a"&set /a "#x!#!=10000+%%a"
:: for information - number of blocks found (#) start line number of block n (#sn) start line number of next block (#xn)
REM set #
for /f "delims=" %%a in ('dir /b /a-d "%sourcedir%\*.txt" ') do if "%%a" neq "%filename1%" (
rem remove variables starting #M
FOR /F "delims==" %%n In ('set #M 2^>Nul') DO SET "%%n="
rem Find differences between basefile "filename1" and other files; number lines, then extract line numbers.
(for /f "delims=: " %%n in ('FC /N "%fullname1%" "%sourcedir%\%%a"') do echo %%n|findstr /b /R "[1-9]")>"%destdir%\{"
rem then record the start line of the blocks within which the line appears
(
for /f "usebackq" %%L in ("%destdir%\{") do for /L %%b in (1,1,%#%) do if %%L gtr !#s%%b! if %%L lss !#x%%b! set "#M=!#M! !#s%%b!"
rem and regurgitate lines in the blocks that start on lines #M
set /a #L=0
set "#R="
for /f "usebackqdelims=" %%L in ("%sourcedir%\%%a") do (
set /a #L+=1
if defined #R (
echo %%L
if "%%L"=="}," set "#R="
if "%%L"=="}" set "#R="
) else (
for %%r in (!#M!) do if %%r==!#L! set "#R=Y"
if defined #R echo {
)
)
)>"%destdir%\%%~na.diff"
)
GOTO :EOF
You would need to change the settings of sourcedir
and destdir
to suit your circumstances. The listing uses a setting that suits my system.
I used a file named q64237051.txt
containing your data for my testing.
First we must clarify the requirements. I've assumed that a difference file is required containing the entire contents of each {
..}
block where a difference is found.
So - first establish the directory and filenames required and specify the filename which contains the base against which all differences are to be detected.
In this routine, I use variables starting #
for every variable used by the process, so clear out any existing #... (unlikely to be any, but still...) and establish delayedexpansion
processing so that !var!
can be used within blocks to access the dynamic value of variables.
Look in the base file for all lines "{" by using findstr /n
to number the lines found an /x
for an exact-match to {
. This produces lines of the form linenumber:linedata
so use for /f
with :
as a delimiter to record those line numbers, as #s (starting) and #x (next-starting) line numbers.
The effect can be seen by un-commenting the set #
command. Note that #
is used as a counter.
So - into the meat of the matter. %%a
is set to each filename found in sequence, and we skip the base file. Then we are processing each file "%%a
"...
first clear every #M
variable.
Next, use fc /n
to find the differences. This will produce lines of the format n:data
for the lines of interest and also other lines that can be ignored. get the first token of each line from the fc /n
report using spaces and colons as delimiters. The lines of interest will survive the filter of a regular expression "numeric" applied via a findstr
with option /b
so the match is applied to the beginning of the line. Note that no line number will start 0
, so the octal-processing that may otherwise be needed is not required. Put the resultant list of line-numbers-where-a-difference-is-found into a temporary file {
(just a filename, nothing magical about it)
Next step is parenthesised to allow all ECHO
ed output to be gathered into the differences file.
Now process the {
file containing the line numbers of all differences (and their immediate neighbours) and see which block each fits into. %#%
is the block-count. #M
will be built with a space-separated list of the required start-of-block line numbers (and there will be duplicates, but this is of no consequence)
Initialise #L
(line number) and #R
(regurgitate flag) and process each line in the file %%a
.
Increment the line number
if we are regurgitating, echo
the line and then clear #R
if we have an end-of-block (},
or no doubt }
)
if not regurgitating, detect whether we are at a line number which matches a start-of-required-block. If so, turn on regurgitation by setting #R
non-empty and echo
the {
at the start of a new block.
done!
Note that the metavariable
used in for
is case-sensitive. I use upper-case L
because lower-case can be confused by some with other characters.
Upvotes: 1