Reputation: 1632
I have html page with many lines and one of the line is:
var premium_download_link = 'http://www.someurl.com/';
how can I find that line inside html page and extract http://www.someurl.com from the line?
Upvotes: 0
Views: 526
Reputation: 54551
Using sed
:
sed -n -e "s/.*var premium_download_link = '\([^']*\)';.*/\1/p"
The -n
flag suppresses printing unless we explicitly print using p
. Thus only matched (then substituted) lines are printed.
EDIT (based on OP comment):
To get this in a shell variable you might want something like:
url=$(wget -qO - "http://originalurl.com/" | sed -n -e "s/.*var premium_download_link = '\([^']*\)';.*/\1/p")
This fetches the page and runs it through sed
. The output should be the url, which gets stored in a variable named url
.
Upvotes: 2
Reputation: 77085
With awk
you can extract specific field values by defining the field separator variable.
For instance, the following should work -
$ echo "var premium_download_link = 'http://www.someurl.com/';" |
awk -F"'" '{ print $2 }'
http://www.someurl.com/
However, your html
file may have other content. So you can add a regex in front of the script to ensure that it runs only when the specific line is encountered.
For example -
awk -F"'" '/premium_download_link/{ print $2 }'
Upvotes: 2
Reputation: 1210
echo "var premium_download_link = 'http://www.someurl.com/'" | awk '{print substr ($4,2,23)}'
Upvotes: 3
Reputation: 20065
With awk :
awk -F "'" '{ for (f=1; f<=(NF-1)/2; f++) print $(f*2) }' $1
-F "'"
define the quote '
as the separator for given input.
Upvotes: 2