Reputation: 55
I am not a expert in this field so please help me out and spare my ignorance if any.
I am trying to curl through a page and want to get value of the hidden <input>
field. I am not familiar with regexp. my code is as below:
$page = curl_exec($ch);
}
curl_close($ch);
function parse_form_fields($page, $username, $password){
preg_match("/<input id=\"signuptoken\" type=\"hidden\" value=\"(.+?)\" name=\"signuptoken\"/", $page, $m);
$captchatoken = $m[1];
$parameters[] = "newaccounttoken=" . urlencode($captchatoken);
}
the form field is as below:
<input id="signuptoken" type="hidden" value="03AHJ_Vuv2ts6ev2LltAkZB91vjD6k-BsW3286bTC9QZYZLSHQUMNDQJFUaNmAQMAYb9FDhIkOFzAisafasfsTZuv_pl5KvkYNfsGUPcOAEX5YPlMaMOi7MZJq4ky0v_GyM60SmMgjPrtfZSJYE0hqw--GsfsafasmER0Sksr6OAvnLnBVAMsKcCi7uM" name="signuptoken">
I want to get the value out for this input field.
Upvotes: 1
Views: 13647
Reputation: 3362
Don't use things like value=\"(.+?)\"
, you may have a lot of troubles with them in some bad-formated HTML. Use something more limiting like value=\"([^\">]+?)\"
. The difference is that .
matches a lot more entities, than [^">]
, which will always end on tag close or quote close.
The problem in your case might be the lack of multi-line match modifier s, try preg_match('/<input id="signuptoken" type="hidden" value="(.*?)"/s', $page, $m);
.
Other than that, I'll second that, use DOM.
Also, save page HTML into file and test your RegEx on local file instead of calling the page every time.
Upvotes: 0
Reputation: 3548
You're better off using DOMDocument. For example:
$html = '<input id="signuptoken" type="hidden" value="03AHJ_Vuv2ts6ev2LltAkZB91vjD6k-BsW3286bTC9QZYZLSHQUMNDQJFUaNmAQMAYb9FDhIkOFzAisafasfsTZuv_pl5KvkYNfsGUPcOAEX5YPlMaMOi7MZJq4ky0v_GyM60SmMgjPrtfZSJYE0hqw--GsfsafasmER0Sksr6OAvnLnBVAMsKcCi7uM" name="signuptoken">';
$dom = new DomDocument();
$dom->loadHTML($html);
$signuptoken = $dom->getElementById("signuptoken");
echo $signuptoken->getAttribute('value');
Upvotes: 5
Reputation: 2889
This should work for you to find the value:
<?php
$input = '<input id="signuptoken" type="hidden" value="03AHJ_Vuv2ts6ev2LltAkZB91vjD6k-BsW3286bTC9QZYZLSHQUMNDQJFUaNmAQMAYb9FDhIkOFzAisafasfsTZuv_pl5KvkYNfsGUPcOAEX5YPlMaMOi7MZJq4ky0v_GyM60SmMgjPrtfZSJYE0hqw--GsfsafasmER0Sksr6OAvnLnBVAMsKcCi7uM" name="signuptoken">';
$result = preg_match('/<input id="signuptoken" type="hidden" value="(.*?)"/', $input, $matches);
if(!$result){
// Could not find input
} else {
// Input value found
echo 'Value: '.$matches[1];
}
Parsing HTML with regex is not exactly resilient, however, as simply changing the order of the id
and the type
in the example input
tag will break the scraper. If you're sure the HTML will never change, that shouldn't be an issue, but just be aware a DOM parser may be more useful in some cases.
Upvotes: 2