Reputation: 3611
Task in CMD.
1) How can I compare if string is in string? I checked manual here for "Boolean Test "does string exist ?"" But I can't understand the example or it does not work for me. This piece of code, it is just a try. I try to make a string compare of filter some sting if there is a tag <a>
in a line.
FOR /f "tokens=* delims= usebackq" %%c in ("%source%") DO (
echo %%c
IF %%c == "<a" (pause)
)
So while I read a file, it should be paused if there is a link on a line.
2) I have one more ask. I would need to filter the line if there is a specific file in the link, and get content of the link. My original idea was to try to use findstr
with regex, but it seems not to use sub-patterns. And next problem would be how to get the result to variable.
set "pdf=0_1_en.pdf"
type "%source%" | grep "%pdf%" | findstr /r /c:"%pdf%.*>(.*).*</a>"
So in summary, I want to go through file and if there is a link like this: REPAIRED: *
<a href="/Dokumenter/dsweb/Get/Document-408/EK_GEN_0_1_en.pdf" class="uline"><b>GEN 0.1 Preface</b></a>
Get the title GEN 0.1 Preface. But you should know, that there are also similar links with same link, which contain image, not a text inside a tag.
Code according Aacini to be changed a little bit:
@echo off
setlocal EnableDelayedExpansion
set "source=GEN 0 GENERAL.html"
set "pdf=0_1_en.pdf"
echo In file:%source%
echo Look for anchor:%pdf%
rem Process each line in %source% file:
for /F "usebackq delims=" %%c in ("%source%") do (
set "line=%%c"
rem Test if the line contain a "tag" that start with "<a" string:
set "tag=!line:*<a=!"
if not "!tag!" == "!line!" (
rem Take the string in tag that end in ">"
for /F "delims=^>" %%a in ("!tag!") do set "link=%%a"
echo Link found: !link!
if "!link!" == "GEN 0.1 Preface" echo Seeked link found
)
)
pause
Still not finished
Upvotes: 0
Views: 1452
Reputation: 67216
Although your question is extensive it does not provide to much details, so I assumed several points because I don't know too much about .PDF files, tags, etc.
@echo off
setlocal EnableDelayedExpansion
set "source=GEN 0 GENERAL.html"
set "pdf=0_1_en.pdf"
echo In file: "%source%"
echo Look for anchor: "%pdf%"
rem Process each line in %source% file:
for /F "usebackq delims=" %%c in ("%source%") do (
set "line=%%c"
rem Test if the line contain "<a>" tag:
set "tag=!line:*<a>=!"
if not "!tag!" == "!line!" (
rem Test if "<a>" tag contain the anchor pdf file:
if not "!tag:%pdf%=!" == "!tag!" (
rem Get the value of "<b>" sub-tag
set "tag=!tag:<b>=$!"
set "tag=!tag:</b>=$!"
for /F "tokens=2 delims=$" %%b in ("!tag!") do set title=%%b
echo Title found: "!title!"
)
)
)
pause
Any missing point can be added or fixed, if you give me precise details about them.
EDIT: I fixed the program above after last indications from the OP. I used $ character to get the Title value; if this character may exist in original Tag, it must be changed by another unused one.
I tested this program with this "GEN 0 GENERAL.html" example file:
Line one
<a>href="/Dokumenter/EK_GEN_0_X_en.pdf" class="uline"><b>GEN 0.X Preface</b></a>
Line three
<a>href="/Dokumenter/EK_GEN_0_1_en.pdf" class="uline"><b>GEN 0.1 Preface</b></a>
Line five
and get this result:
In file: "GEN 0 GENERAL.html"
Look for anchor: "0_1_en.pdf"
Title found: "GEN 0.1 Preface"
EDIT: New faster method added
There is a simpler and faster method to solve this problem that, however, may fail if a line contains more than one tag:
@echo off
setlocal EnableDelayedExpansion
set "source=GEN 0 GENERAL.html"
set "pdf=0_1_en.pdf"
echo In file: "%source%"
echo Look for anchor: "%pdf%"
for /F "delims=" %%c in ('findstr /C:"<a>" "%source%" ^| findstr /C:"%pdf%"') do (
set "tag=%%c"
rem Get the value of "<b>" sub-tag
set "tag=!tag:<b>=$!"
set "tag=!tag:</b>=$!"
for /F "tokens=2 delims=$" %%b in ("!tag!") do set title=%%b
echo Title found: "!title!"
)
pause
Upvotes: 1
Reputation: 3611
I have modified the way to do it. I realized that it is better to find name of pdf document first. This is my almost completed solution, but I ask you if you could help me with the last point. The last replacing statement does not work because I need to remove closing tag b. Just to get the title.
@echo off
setlocal EnableDelayedExpansion
set "source=GEN 0 GENERAL.html"
set "pdf=0_1_en.pdf"
echo In file:%source%
echo Look for anchor:%pdf%
rem Process each line in %source% file:
for /F "usebackq delims=" %%c in ("%source%") do (
set "line=%%c"
REM Test if the line contains pdf file I look for:
SET "pdfline=!line:%pdf%=!"
if not "!pdfline!" == "!line!" (
cls
echo Line: !line!
REM Test if the pdfline contains tag b
SET "tagline=!pdfline:*><b>=!"
if not "!tagline!" == "!pdfline!" (
cls
echo ACTUAL LINE: !tagline!
REM Remove closing tag b
SET "title=!tagline:</b*=!"
echo TITLE: !title!
pause
)
)
)
pause
BTW: The html page I work with is here.
So I ask you to help complete/repair line SET "title=!tagline:</b*=!"
Upvotes: 0
Reputation: 3370
First, one important question: does this really have to be implemented via a CMD script? Would you be able to go with VBScript, PowerShell, C#, or some other scripting/programming language? CMD is a notoriously painful scripting environment.
Secondly, I'm not sure if this answers your question--it's a bit unclear--but here's a quick trick you can use to see in CMD to see if a given string contains another substring:
setlocal enableextensions enabledelayedexpansion
set PATTERN=somepattern
for /f "delims=" %%f in (somefile.txt) do (
set CURRENT_LINE=%%f
if "!CURRENT_LINE:%PATTERN%=!" neq "!TEMP!" (
echo Found pattern in line: %%f
)
)
The idea is that you try to perform string replacement and see if anything was changed. This is certainly a hack, and it would be preferable if you could instead use a tool like findstr
or grep
, but if you're limited in your options, something like the above should work.
NOTE: I haven't actually run the above script excerpt, so let me know if you have any difficulty with it.
Upvotes: 0