Ben
Ben

Reputation: 249

impossible to html parsing facebook public profile with php

i grabbed an html from this url : http://facebook.com/zuck there is no problem to echo it to the client browser but i found it impossible to parse it with php.

i am trying to parse the text inside div tags for example :

preg_match_all("/<div class=\"mediaPageName\">(.*)<\/div>/",$html,$matches);
print_r($matches);

returns empty array i also tried with DOMDocument and with PHP Simple HTML DOM Parser both of them return empty elements and can't grab the text of the html.

how is it even possible? there is a solution to that?

Upvotes: 1

Views: 2170

Answers (3)

Ben
Ben

Reputation: 249

$html = str_replace(array('\u003c','\"','\/'), array('<','"','/'), $html);
preg_match_all('/<div class=\"mediaPageName\">(.*?)<\/div>/', $html, $matches); 
var_dump($matches);

must be a way to do it with one single line of preg_match instead of the code above and also grab this tag <span class="fwb">text</span>, but i don't know how to write it in a single line.

Upvotes: 1

brian_d
brian_d

Reputation: 11395

It is quite possible.

Easiest way is to load the complete DOM into DOMDocument or phpQuery

Edit:

From looking at the source code of the link provided, the element you are searching for is replacing less than characters, < with the unicode representation: \u003c.

Example: \u003cdiv class=\"mediaPageName\">Nirvana\u003c\/div>

Edit 2:
As mentioned by others, do not parse HTML when not necessary. But it looks like this is required in this case as Frank Farmer mentions.

This regex will find some matches (only one per line, hopefully someone can adjust it to get all the matches). preg_match_all('%\\\\u003cdiv class=.*mediaPageName[^>]*>([^>]*)\\\\u003c%i', $html, $matches);

It may be worthwhile finding out how to use Unicode regex as outlined here.

Upvotes: 3

bensnider
bensnider

Reputation: 3772

You're probably going to be much better off in the long run if you just use the Graph API. The profile picture and some basic account information is public and requires no authentication or authorization. Just issue a request for http://graph.facebook.com/zuck/picture for example.

Upvotes: 2

Related Questions