Reputation: 249
i grabbed an html from this url : http://facebook.com/zuck there is no problem to echo it to the client browser but i found it impossible to parse it with php.
i am trying to parse the text inside div tags for example :
preg_match_all("/<div class=\"mediaPageName\">(.*)<\/div>/",$html,$matches);
print_r($matches);
returns empty array i also tried with DOMDocument and with PHP Simple HTML DOM Parser both of them return empty elements and can't grab the text of the html.
how is it even possible? there is a solution to that?
Upvotes: 1
Views: 2170
Reputation: 249
$html = str_replace(array('\u003c','\"','\/'), array('<','"','/'), $html);
preg_match_all('/<div class=\"mediaPageName\">(.*?)<\/div>/', $html, $matches);
var_dump($matches);
must be a way to do it with one single line of preg_match instead of the code above and also grab this tag <span class="fwb">text</span>
, but i don't know how to write it in a single line.
Upvotes: 1
Reputation: 11395
It is quite possible.
Easiest way is to load the complete DOM into DOMDocument or phpQuery
Edit:
From looking at the source code of the link provided, the element you are searching for is replacing less than characters, <
with the unicode representation: \u003c
.
Example: \u003cdiv class=\"mediaPageName\">Nirvana\u003c\/div>
Edit 2:
As mentioned by others, do not parse HTML when not necessary. But it looks like this is required in this case as Frank Farmer mentions.
This regex will find some matches (only one per line, hopefully someone can adjust it to get all the matches).
preg_match_all('%\\\\u003cdiv class=.*mediaPageName[^>]*>([^>]*)\\\\u003c%i', $html, $matches);
It may be worthwhile finding out how to use Unicode regex as outlined here.
Upvotes: 3
Reputation: 3772
You're probably going to be much better off in the long run if you just use the Graph API. The profile picture and some basic account information is public and requires no authentication or authorization. Just issue a request for http://graph.facebook.com/zuck/picture
for example.
Upvotes: 2