theboy
theboy

Reputation: 353

Extracting URL from text file in Batch

I have a script that needs to extract a YouTube URL from a text file.

Here's what I have in the text file (output.txt):

  ---------- NUMBER11.TXT
              
<link itemprop="url" href="http://www.youtube.com/channel/UCnxGkOGNMqQEUMvroOWps6Q">

Note the text file has a line of empty space to start, which is annoying, and the URL is on line 3. Something that doesn't show up in the formatting for this site is the 11 spaces before the actual href starting as well. I'd like to separate it from the mass of other junk.

I've tried something like this:

set /p long= < output.txt
echo %long%

set short1=%long:^<link itemprop^="url" href^="=%
echo %short1% > o1.txt

I thought this would remove the selected text from the file, but I think this is a little over my head.

I'm getting the output.txt from firstly a curl of a youtube video page, and secondly from a find command here:

find "href=""http://www.youtube.com/channel/" %vd% > output.txt

Maybe I'm making this more complicated than it is?

Upvotes: 0

Views: 2056

Answers (3)

Compo
Compo

Reputation: 38622

I would suggest you parse the results directly from your curl command instead of outputting them to a text file, and then using find against that output.

However, instead of using find.exe, I would suggest you use the following method using findstr.exe instead, to get the URL assigned to any line containing href= followed by "http: or "https and subsequently followed by youtube.com.

@Echo Off
SetLocal EnableExtensions DisableDelayedExpansion
For /F Tokens^=*EOL^= %%G In (
    '%__APPDIR__%findstr.exe /IR "href=\"http[s:].*youtube\.com" "output.txt"'
) Do (Set "Line=%%G" & SetLocal EnableDelayedExpansion
    For /F Tokens^=2Delims^=^" %%H In ("!Line:*href=!") Do EndLocal & Echo %%H)
Pause

If you want the output stored as a variable, instead of Echoing it, change Echo %%H to Set "URL=%%H". You could then use %URL%, (or "%URL%" if you need it doublequoted), elsewhere in your script.

Upvotes: 0

Magoo
Magoo

Reputation: 80033

@ECHO OFF
SETLOCAL
SET "sourcedir=U:\sourcedir"
SET "filename1=%sourcedir%\q64572433.txt"

set "url="
FOR /f "tokens=4,5delims=>= " %%a IN (%filename1%) DO if "%%~a"=="href" set "url=%%~b"

echo URL=%url%

GOTO :EOF

You would need to change the setting of sourcedir to suit your circumstances. The listing uses a setting that suits my system.

I used a file named q64572433.txt containing your data for my testing.

The for command tokenises each line of the file, using =, > and space as delimiters (the 3 characters between delims= and ")

On the line of interest, token 4 would be href and token 5 the url - and this is the only line where href is the fourth token. When that is detected, assign the 5th token (in %%b) to the variable, removing the quotes with ~ for good measure.

Upvotes: 0

user7818749
user7818749

Reputation:

Using batch-files to access files with special characters, like redirect, it can cause some problems, so it is not recommended, but I felt like posting an answer anyway, so given you exact example, here is one way. If your example is not as per your post, which I highly expect it to be, then this probably would not work.

@echo off
setlocal enabledelayedexpansion
for /f "usebackq delims=" %%i in ("output.txt") do for %%a in (%%i) do (
    set "var=%%~a"
    set "var=!var:>=!"
    set "var=!var:"=!"
    if "!var:~0,4!" == "http" echo !var!
)

Upvotes: 1

Related Questions