Reputation: 138
I'm trying to get the value of a hidden input field from twitters follow page. I'm using file_get_contents on the url and then try to get the value of the input with the preg_match_all function but something in my code isn't working because I just get back an empty array. I would really appreciate it if someone would look over my code and help me get it to work.
HTML Code of the input field I want to get the value from (in this example twitter)
<input type="hidden" name="screen_name" value="twitter">
My code snippet I don't get to work
$html = file_get_contents($url);
preg_match_all("/<input type=\"hidden\" name=\"screen_name\" value=\"(.*?)\">/", $html, $screen_name);
echo "<pre>", print_r($screen_name, true), "</pre>";
This code should output the value of the input field in this example just twitter within an array.
Edit: My code snippet works find I just haven't noticed that Twitter only shows this hidden input field if your logged in on twitter and of course if you use file_get_contents your webserver will not be logged in to twitter and can't get the HTML code what you get if you are logged in. Thanks to vigikaran for pointing this out to me and to gilbert for improving the regex in my code snippet.
Upvotes: 0
Views: 227
Reputation: 3776
Without actually grabbing a Twitter page I notice your regular expression is susceptible to extra white space within html tags. This can be a real problem for screen scraping. Try:
'/<input\s+type="hidden"\s+name="screen_name"\s+value="(.*?)">/',
or if you want to have a better ability to resist small changes on Twitter's part the following will work as long as name= precedes value=:
'/<input\s+[^<>]*\s+name\s*=\s*"screen_name"[^<>]*\s+value\s*=\s*"(.*?)">/',
(edited above to improve resistance to white-space changes)
Upvotes: 2
Reputation: 138
Thanks to vigikaran for pointing this out to me my code is fine and works but the HTML code I got from twitter doesn't contain the hidden input field I was looking for because It only shows this input field if your logged in and of course if you use file_get_contents your webserver will not be logged in. Thanks to everyone for the help and to gilbert for improving the regex from my code snippet.
Upvotes: 1
Reputation: 5664
This is working for me:
$html = '<input type="hidden" name="screen_name" value="twitter">';
preg_match_all('/<input type=\"hidden\" name=\"screen_name\" value=\"(.*?)\">/', $html, $screen_name);
echo "<pre>", print_r($screen_name, true), "</pre>";
You can check it here https://eval.in/626194
The string is in $screen_name[1][0]
Upvotes: 2