andy
andy

Reputation: 391

Extract part of a curl return in Bash to allocate to a variable

I would like to extract a string value from a curl returned webpage in a bash script but am unsure how to go about this?

The value I am interested in is always returned by curl looks like this:

    <head>
    <title>UKIPVPN.COM FREE VPN Service</title>
    <style type='text/css'>
      #button {
        width:180px;
        height:60px;
        font-family:verdana,arial,helvetica,sans-serif;
        font-size:20px;
        font-weight: bold;
      }
    </style>
  </head>
  <br>
  <br>
     <font color=blue><center>  <h1>Welcome to Free UK IP VPN Service</h1>               </center></font>

     <form method='post' action='http://www.ukipvpn.com'>
  <center><input type='hidden' name='sessionid' value='4b5q43mhhgl95nsa9v9lg8kac7'></center><br>
  <center><input id='button' type='submit' value='  I AGREE  ' /><br><br>     <h2> Your TOS Let me use the Free VPN Service</h2></center>
     </form>



       <br><center><font size='2'>No illegal activities allowed. In case of abuse, users' VPN access log is subjected to expose to related authorities.</font></center>
       </html>

The value I would like to extract to a variable in Bash is the value='this is the value i am interested in'.

Thanks for any help;

Andy

Upvotes: 0

Views: 1136

Answers (2)

that other guy
that other guy

Reputation: 123410

There are some arguments against using regex to parse HTML.

Here's a more robust XPath based version using tidy and xmlstarlet:

var=$(curl someurl | 
  tidy -asxml 2> /dev/null | 
  xmlstarlet sel -t -v '//_:input[@name="sessionid"]/@value' 2> /dev/null); 

Upvotes: 1

Avinash Raj
Avinash Raj

Reputation: 174696

You could try the below.

$ val=$(curl somelink | grep -oP "name='sessionid'[^<>]*\bvalue\s*=\s*'\K[^']*")

Upvotes: 1

Related Questions