Reputation: 391
I would like to extract a string value from a curl returned webpage in a bash script but am unsure how to go about this?
The value I am interested in is always returned by curl looks like this:
<head>
<title>UKIPVPN.COM FREE VPN Service</title>
<style type='text/css'>
#button {
width:180px;
height:60px;
font-family:verdana,arial,helvetica,sans-serif;
font-size:20px;
font-weight: bold;
}
</style>
</head>
<br>
<br>
<font color=blue><center> <h1>Welcome to Free UK IP VPN Service</h1> </center></font>
<form method='post' action='http://www.ukipvpn.com'>
<center><input type='hidden' name='sessionid' value='4b5q43mhhgl95nsa9v9lg8kac7'></center><br>
<center><input id='button' type='submit' value=' I AGREE ' /><br><br> <h2> Your TOS Let me use the Free VPN Service</h2></center>
</form>
<br><center><font size='2'>No illegal activities allowed. In case of abuse, users' VPN access log is subjected to expose to related authorities.</font></center>
</html>
The value I would like to extract to a variable in Bash is the value='this is the value i am interested in'.
Thanks for any help;
Andy
Upvotes: 0
Views: 1136
Reputation: 123410
There are some arguments against using regex to parse HTML.
Here's a more robust XPath based version using tidy
and xmlstarlet
:
var=$(curl someurl |
tidy -asxml 2> /dev/null |
xmlstarlet sel -t -v '//_:input[@name="sessionid"]/@value' 2> /dev/null);
Upvotes: 1
Reputation: 174696
You could try the below.
$ val=$(curl somelink | grep -oP "name='sessionid'[^<>]*\bvalue\s*=\s*'\K[^']*")
Upvotes: 1