cikatomo
cikatomo

Reputation: 1632

extract url part of the line

I have html page with many lines and one of the line is:

var premium_download_link = 'http://www.someurl.com/';

how can I find that line inside html page and extract http://www.someurl.com from the line?

Upvotes: 0

Views: 526

Answers (5)

glenn jackman
glenn jackman

Reputation: 246754

grep -Po "(?<=premium_download_link = ')[^']+"

Upvotes: 1

FatalError
FatalError

Reputation: 54551

Using sed:

sed -n -e "s/.*var premium_download_link = '\([^']*\)';.*/\1/p"

The -n flag suppresses printing unless we explicitly print using p. Thus only matched (then substituted) lines are printed.

EDIT (based on OP comment):

To get this in a shell variable you might want something like:

url=$(wget -qO - "http://originalurl.com/" | sed -n -e "s/.*var premium_download_link = '\([^']*\)';.*/\1/p")

This fetches the page and runs it through sed. The output should be the url, which gets stored in a variable named url.

Upvotes: 2

jaypal singh
jaypal singh

Reputation: 77085

With awk you can extract specific field values by defining the field separator variable.

For instance, the following should work -

$ echo "var premium_download_link = 'http://www.someurl.com/';" | 
awk -F"'" '{ print $2 }' 
http://www.someurl.com/

However, your html file may have other content. So you can add a regex in front of the script to ensure that it runs only when the specific line is encountered.

For example -

awk -F"'" '/premium_download_link/{ print $2 }' 

Upvotes: 2

Tedee12345
Tedee12345

Reputation: 1210

echo "var premium_download_link = 'http://www.someurl.com/'" | awk '{print substr ($4,2,23)}'

Upvotes: 3

alain.janinm
alain.janinm

Reputation: 20065

With awk :

awk -F "'" '{ for (f=1; f<=(NF-1)/2; f++) print $(f*2) }' $1

-F "'" define the quote ' as the separator for given input.

Upvotes: 2

Related Questions