Leon
Leon

Reputation: 51

PHP parsing external site

I do not have any experience in parsing external url's to grab some data from, but today i've try some experiments:

$str1 = file_get_contents('http://www.indiegogo.com/projects/ubuntu-edge');
$test1 = strstr($str1, "amount medium clearfix");
$parts = explode(">",$test1);
$parts2 = vsprintf("%s", $parts[1]);

$str2 = file_get_contents('http://www.indiegogo.com/projects/ubuntu-edge');
$test2 = strstr($str2, "money-raised goal");
$test3 = str_ireplace("money-raised goal", "", "$test2");
$test4 = str_ireplace("\"", "", "$test3");
$test5 = str_ireplace(">", "", "$test4");
$test6 = substr($test5, 0, 29);
$test7 = explode("Raised of", $test6);
$test8 = vsprintf("%s", $test7[1]);

try the code with:

print_r($parts2); then with print_r($test8); and then with echo "$parts2 - $test8";

Because it's so popular this days the Ubuntu Edge campaign i have try to get the two fields from site (only as a experiment), but without success. Well it grabs the two fields, but i could not put both in the same variable. The output is or the $parts2, or the $parts2 contain the value of test8, or only the $test8.

What i'm doing wrong, and why? Also is there a simpler method to do what i want, without so much code?

Upvotes: 1

Views: 1892

Answers (1)

Herbert
Herbert

Reputation: 5778

Well it grabs the two fields, but i could not put both in the same variable.

Not sure what you mean there.

Also is there a simpler method to do what i want, without so much code?

Without so much code? No. More flexible and (possibly) efficient? Yes.

Try this and tailor it to your liking

<?php
$page = file_get_contents('http://www.indiegogo.com/projects/ubuntu-edge');

$doc = new DOMDocument;
libxml_use_internal_errors(true);
$doc->loadHTML($page);

$finder = new DomXPath($doc);

// find class="money-raised"
$nodes = $finder->query("//*[contains(@class, 'money-raised')]");

// get the children of the first match  (class="money-raised")
$raised_children = $nodes->item(0)->childNodes;

// get the children of the second match (class="money-raised goal")
$goal_children = $nodes->item(1)->childNodes;

// get the amount value
$money_earned = $raised_children->item(1)->nodeValue;

// get the amount value
preg_match('/\$[\d,]+/', $goal_children->item(0)->nodeValue, $m);
$money_earned_goal = $m[0];


echo "Money earned: $money_earned\n";
echo "Goal: $money_earned_goal\n";

?>

This has eleven lines of code without the echos (compared to your 12 lines), but only calls the other site once. Scraping websites is a somewhat involved task. This code gets exactly the values you wanted from this exact page.

If you want to scrape sites, I strongly recommend learning to use DOMDocument and DOMXPath. There is a lot to learn, but it's worth the effort.

Upvotes: 2

Related Questions