Reputation: 1200
if you search alexa with any URL's you will get a detailed traffic information of the same. what I am looking into is I would like to parse Visitors by Country info from alexa.
example for google.com
url is - http://www.alexa.com/siteinfo/google.com.
on the Audience tab you can see:
Visitors by Country for Google.com
United States 35.0%
India 8.8%
China 4.1%
Germany 3.4%
United Kingdom 3.2%
Brazil 3.2%
Iran 2.8%
Japan 2.1%
Russia 2.0%
Italy 1.9%
Brazil 3.2%
Iran 2.8%
Japan 2.1%
Russia 2.0%
Italy 1.9%
Indonesia 1.7% //etc.
How can I get only these info from alexa.com?? I have tried with preg_match
function but it is very difficult in this case....
Upvotes: 1
Views: 1055
Reputation: 2547
If you don't want to use DOM and getElementById which is the most elegant solution in this case, you can try regexp:
$data = file_get_contents('http://www.alexa.com/siteinfo/google.com');
preg_match_all(
'/<a href="\/topsites\/countries\/(.*)">(.*)<\/a>/mU',
$data,
$result,
PREG_SET_ORDER
);
The DOM solution looks like:
$doc = new DomDocument;
$doc->loadHTMLFile('http://www.alexa.com/siteinfo/google.com');
$data = $doc->getElementById('visitors-by-country');
$my_data = $data->getElementsByTagName('div');
$countries = array();
foreach ($my_data as $node)
{
foreach($node->getElementsByTagName('a') as $href)
{
preg_match('/([0-9\.\%]+)/',$node->nodeValue, $match);
$countries[trim($href->nodeValue)] = $match[0];
}
}
var_dump($countries);
Upvotes: 3