Reputation: 21
I'm trying to pull in an element from an external website using PHP and cURL.
The link to the website I'm trying to pull content from is: http://www.stayclassy.org/fundraise?fcid=231864
The element I'm targeting is the number value under the list item
"Raised So Far" in the right column at the top (right now the value is at $10).
Here is the code I'm using to extract the data:
define("TARGET", "http://www.stayclassy.org/fundraise?fcid=231864");
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, TARGET);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
if(!($results = curl_exec($curl))) {
print("{ \"total\": \"$0.00\" }");
return;
}
$pattern = '/\<li class="goalTitle"\> \$(.+?) \<\/li\>\<\/a\>/';
preg_match_all($pattern, $results, $matches);
$total = $matches[1][0];
$total = str_replace(",", "", $total);
printf("{ \"total\": \"$%s\" }", formatMoney($total, true));
function formatMoney($number, $fractional=false)
{
if ($fractional) {
$number = sprintf('%.2f', $number);
}
while (true) {
$replaced = preg_replace('/(-?\d+)(\d\d\d)/', '$1,$2', $number);
if ($replaced != $number) {
$number = $replaced;
} else {
break;
}
}
return $number;
}
The issue I'm having is that the list item/element I'm targeting doesn't have a unique ID or class. In fact, the dollar amount is located in a separate list item without a class.
I was wondering how to target a specific list item in an unordered list using the code above, particularly when it doesn't have a class. Any ideas?
Upvotes: 2
Views: 831
Reputation: 4356
Targeting the specific item requires that you identify a unique string around it. To do this you just expand further and further out until you find a string you can identify that only occurs once. So, the line you want is:
<li>$10</li>
but this is not unique at all. So we expand the string by adding the previous line as well:
<li class="goalTitle">Raised so far:</li>
<li>$10</li>
and bingo, this string is unique for your needs. The string is fairly constant except for your amount, so it will be easy to use. So you need a regular expression that finds this string. I'd use something like this:
$pattern = '/<li class="goalTitle">Raised so far:<\/li>\s*<li>\$(\d+)<\/li>/';
You don't need to use preg_match_all
because you only expect to get one match:
preg_match($pattern, $results, $matches);
$total = $matches[1];
Your other options include loading the page with a DOMDocument
, and then using XPath
or getElementById
to parse the DOM. But that may be a little too much effort for this task.
Also, I'd use file_get_contents
to fetch the contents of the remote site. But that's just me.
UPDATE: To handle thousands separators as well, modify your pattern as follows:
$pattern = '/<li class="goalTitle">Raised so far:<\/li>\s*<li>\$([\d\.,]+)<\/li>/';
Upvotes: 2