John Boe
John Boe

Reputation: 3611

Get string from file in batch

Task in CMD.

1) How can I compare if string is in string? I checked manual here for "Boolean Test "does string exist ?"" But I can't understand the example or it does not work for me. This piece of code, it is just a try. I try to make a string compare of filter some sting if there is a tag <a> in a line.

FOR /f "tokens=* delims= usebackq" %%c in ("%source%") DO ( 
echo %%c
IF %%c == "<a" (pause) 
)

So while I read a file, it should be paused if there is a link on a line.

2) I have one more ask. I would need to filter the line if there is a specific file in the link, and get content of the link. My original idea was to try to use findstr with regex, but it seems not to use sub-patterns. And next problem would be how to get the result to variable.

set "pdf=0_1_en.pdf"
type "%source%" | grep "%pdf%" | findstr /r /c:"%pdf%.*>(.*).*</a>"

So in summary, I want to go through file and if there is a link like this: REPAIRED: *

<a href="/Dokumenter/dsweb/Get/Document-408/EK_GEN_0_1_en.pdf" class="uline"><b>GEN 0.1 Preface</b></a>

Get the title GEN 0.1 Preface. But you should know, that there are also similar links with same link, which contain image, not a text inside a tag.

Code according Aacini to be changed a little bit:

@echo off
setlocal EnableDelayedExpansion
set "source=GEN 0 GENERAL.html"
set "pdf=0_1_en.pdf"
echo In file:%source%
echo Look for anchor:%pdf%

rem Process each line in %source% file:
for /F "usebackq delims=" %%c in ("%source%") do (
   set "line=%%c"
   rem Test if the line contain a "tag" that start with "<a" string:
   set "tag=!line:*<a=!"
   if not "!tag!" == "!line!" (
      rem Take the string in tag that end in ">"
      for /F "delims=^>" %%a in ("!tag!") do set "link=%%a"
      echo Link found: !link!
      if "!link!" == "GEN 0.1 Preface" echo Seeked link found
   )
)
pause

Still not finished

Upvotes: 0

Views: 1452

Answers (3)

Aacini
Aacini

Reputation: 67216

Although your question is extensive it does not provide to much details, so I assumed several points because I don't know too much about .PDF files, tags, etc.

@echo off
setlocal EnableDelayedExpansion
set "source=GEN 0 GENERAL.html"
set "pdf=0_1_en.pdf"
echo In file: "%source%"
echo Look for anchor: "%pdf%"

rem Process each line in %source% file:
for /F "usebackq delims=" %%c in ("%source%") do (
   set "line=%%c"
   rem Test if the line contain "<a>" tag:
   set "tag=!line:*<a>=!"
   if not "!tag!" == "!line!" (
      rem Test if "<a>" tag contain the anchor pdf file:
      if not "!tag:%pdf%=!" == "!tag!" (
         rem Get the value of "<b>" sub-tag
         set "tag=!tag:<b>=$!"
         set "tag=!tag:</b>=$!"
         for /F "tokens=2 delims=$" %%b in ("!tag!") do set title=%%b
         echo Title found: "!title!"
      )
   )
)
pause

Any missing point can be added or fixed, if you give me precise details about them.

EDIT: I fixed the program above after last indications from the OP. I used $ character to get the Title value; if this character may exist in original Tag, it must be changed by another unused one.

I tested this program with this "GEN 0 GENERAL.html" example file:

Line one
<a>href="/Dokumenter/EK_GEN_0_X_en.pdf" class="uline"><b>GEN 0.X Preface</b></a>
Line three
<a>href="/Dokumenter/EK_GEN_0_1_en.pdf" class="uline"><b>GEN 0.1 Preface</b></a>
Line five

and get this result:

In file: "GEN 0 GENERAL.html"
Look for anchor: "0_1_en.pdf"
Title found: "GEN 0.1 Preface"

EDIT: New faster method added

There is a simpler and faster method to solve this problem that, however, may fail if a line contains more than one tag:

@echo off
setlocal EnableDelayedExpansion
set "source=GEN 0 GENERAL.html"
set "pdf=0_1_en.pdf"
echo In file: "%source%"
echo Look for anchor: "%pdf%"

for /F "delims=" %%c in ('findstr /C:"<a>" "%source%" ^| findstr /C:"%pdf%"') do (
   set "tag=%%c"
   rem Get the value of "<b>" sub-tag
   set "tag=!tag:<b>=$!"
   set "tag=!tag:</b>=$!"
   for /F "tokens=2 delims=$" %%b in ("!tag!") do set title=%%b
   echo Title found: "!title!"
)
pause

Upvotes: 1

John Boe
John Boe

Reputation: 3611

I have modified the way to do it. I realized that it is better to find name of pdf document first. This is my almost completed solution, but I ask you if you could help me with the last point. The last replacing statement does not work because I need to remove closing tag b. Just to get the title.

@echo off
setlocal EnableDelayedExpansion
set "source=GEN 0 GENERAL.html"
set "pdf=0_1_en.pdf"
echo In file:%source%
echo Look for anchor:%pdf%

rem Process each line in %source% file:
for /F "usebackq delims=" %%c in ("%source%") do (
   set "line=%%c"
   REM Test if the line contains pdf file I look for:
   SET "pdfline=!line:%pdf%=!"


   if not "!pdfline!" == "!line!" (

      cls     
      echo Line: !line!

      REM Test if the pdfline contains tag b
      SET "tagline=!pdfline:*><b>=!"

      if not "!tagline!" == "!pdfline!" (

         cls     
         echo ACTUAL LINE: !tagline!

         REM Remove closing tag b
         SET "title=!tagline:</b*=!"
         echo  TITLE: !title!
         pause
      )
   )
)
pause

BTW: The html page I work with is here.

So I ask you to help complete/repair line SET "title=!tagline:</b*=!"

Upvotes: 0

reuben
reuben

Reputation: 3370

First, one important question: does this really have to be implemented via a CMD script? Would you be able to go with VBScript, PowerShell, C#, or some other scripting/programming language? CMD is a notoriously painful scripting environment.

Secondly, I'm not sure if this answers your question--it's a bit unclear--but here's a quick trick you can use to see in CMD to see if a given string contains another substring:

setlocal enableextensions enabledelayedexpansion

set PATTERN=somepattern

for /f "delims=" %%f in (somefile.txt) do (
    set CURRENT_LINE=%%f
    if "!CURRENT_LINE:%PATTERN%=!" neq "!TEMP!" (
        echo Found pattern in line: %%f
    )
)

The idea is that you try to perform string replacement and see if anything was changed. This is certainly a hack, and it would be preferable if you could instead use a tool like findstr or grep, but if you're limited in your options, something like the above should work.

NOTE: I haven't actually run the above script excerpt, so let me know if you have any difficulty with it.

Upvotes: 0

Related Questions