Deepak Verma
Deepak Verma

Reputation: 55

How to get input field value from a form using preg_match()

I am not a expert in this field so please help me out and spare my ignorance if any. I am trying to curl through a page and want to get value of the hidden <input> field. I am not familiar with regexp. my code is as below:

       $page = curl_exec($ch);
}
curl_close($ch);

function parse_form_fields($page, $username, $password){
    preg_match("/<input id=\"signuptoken\" type=\"hidden\" value=\"(.+?)\" name=\"signuptoken\"/", $page, $m);

    $captchatoken = $m[1];

    $parameters[] = "newaccounttoken=" . urlencode($captchatoken);
}

the form field is as below:

<input id="signuptoken" type="hidden" value="03AHJ_Vuv2ts6ev2LltAkZB91vjD6k-BsW3286bTC9QZYZLSHQUMNDQJFUaNmAQMAYb9FDhIkOFzAisafasfsTZuv_pl5KvkYNfsGUPcOAEX5YPlMaMOi7MZJq4ky0v_GyM60SmMgjPrtfZSJYE0hqw--GsfsafasmER0Sksr6OAvnLnBVAMsKcCi7uM" name="signuptoken">

I want to get the value out for this input field.

Upvotes: 1

Views: 13647

Answers (3)

Ranty
Ranty

Reputation: 3362

Don't use things like value=\"(.+?)\", you may have a lot of troubles with them in some bad-formated HTML. Use something more limiting like value=\"([^\">]+?)\". The difference is that . matches a lot more entities, than [^">], which will always end on tag close or quote close.

The problem in your case might be the lack of multi-line match modifier s, try preg_match('/<input id="signuptoken" type="hidden" value="(.*?)"/s', $page, $m);.

Other than that, I'll second that, use DOM.

Also, save page HTML into file and test your RegEx on local file instead of calling the page every time.

Upvotes: 0

Jeune
Jeune

Reputation: 3548

You're better off using DOMDocument. For example:

$html = '<input id="signuptoken" type="hidden" value="03AHJ_Vuv2ts6ev2LltAkZB91vjD6k-BsW3286bTC9QZYZLSHQUMNDQJFUaNmAQMAYb9FDhIkOFzAisafasfsTZuv_pl5KvkYNfsGUPcOAEX5YPlMaMOi7MZJq4ky0v_GyM60SmMgjPrtfZSJYE0hqw--GsfsafasmER0Sksr6OAvnLnBVAMsKcCi7uM" name="signuptoken">';
$dom = new DomDocument();
$dom->loadHTML($html);

$signuptoken = $dom->getElementById("signuptoken");
echo $signuptoken->getAttribute('value');

Upvotes: 5

Adam
Adam

Reputation: 2889

This should work for you to find the value:

<?php
$input  = '<input id="signuptoken" type="hidden" value="03AHJ_Vuv2ts6ev2LltAkZB91vjD6k-BsW3286bTC9QZYZLSHQUMNDQJFUaNmAQMAYb9FDhIkOFzAisafasfsTZuv_pl5KvkYNfsGUPcOAEX5YPlMaMOi7MZJq4ky0v_GyM60SmMgjPrtfZSJYE0hqw--GsfsafasmER0Sksr6OAvnLnBVAMsKcCi7uM" name="signuptoken">';

$result = preg_match('/<input id="signuptoken" type="hidden" value="(.*?)"/', $input, $matches);
if(!$result){
    // Could not find input
} else {
    // Input value found
    echo 'Value: '.$matches[1];
}

Parsing HTML with regex is not exactly resilient, however, as simply changing the order of the id and the type in the example input tag will break the scraper. If you're sure the HTML will never change, that shouldn't be an issue, but just be aware a DOM parser may be more useful in some cases.

Upvotes: 2

Related Questions