Reputation:
In my application i am trying to get the google indexed pages and i came to know that the number is available in following div
<div id="resultStats"> About 1,960,000 results (0.38 seconds) </div>
now my question is how to extract the number from above div in a web page
Upvotes: 1
Views: 8402
Reputation: 20286
$str = '<div id="resultStats"> About 1,960,000 results (0.38 seconds) </div> ';
$matches = array();
preg_match('/<div id="resultStats"> About ([0-9,]+?) results[^<]+<\/div>/', $str, $matches);
print_r($matches);
Output:
Array (
[0] => About 1,960,000 results (0.38 seconds)
[1] => 1,960,000
)
This is simple regex with subpatterns
([0-9,]+?)
- means 0-9 numbers and , character at least 1 time and not greedy.[^<]+
- means every character but < more than 1 timeecho $matches[1];
- will print the number you want.
Upvotes: 3
Reputation: 3789
Never user regexp to parse HTML. (See: RegEx match open tags except XHTML self-contained tags)
Use a HTML parser, like SimpleDOM (http://simplehtmldom.sourceforge.net/)
You can the use CSS rules to select:
$html = file_get_html('http://www.google.com/');
$divContent = $html->find('div#resultStats', 0)->plaintext;
$matches = array();
preg_match('/([0-9,]+)/', $divContent, $matches);
echo $matches[1];
Outputs: "1,960,000"
Upvotes: 4
Reputation: 1509
You can use regex ( preg_match ) for that
$your div_string = '<div id="resultStats"> About 1,960,000 results (0.38 seconds) </div>';
preg_match('/<div.*>(.*)<\/div>/i', $your div_string , $result);
print_r( $result );
output will be
Array (
[0] => <div id="resultStats"> About 1,960,000 results (0.38 seconds) </div>
[1] => About 1,960,000 results (0.38 seconds)
)
in this way you can get content inside div
Upvotes: 1