cwd
cwd

Reputation: 54836

How do I grab part of a page's HTML DOM with PHP?

I'm grabbing data from a published google spreadsheet, and all I want is the information inside of the content div (<div id="content">...</div>)

I know that the content starts off as <div id="content"> and ends as </div><div id="footer">

What's the best / most efficient way to grab the part of the DOM that is inside there? I was thinking regular expression (see my example below) but it is not working and I'm not sure if it that efficient...

header('Content-type: text/plain');

$foo = file_get_contents('https://docs.google.com/spreadsheet/pub?key=0Ahuij-1M3dgvdG8waTB0UWJDT3NsUEdqNVJTWXJNaFE&single=true&gid=0&output=html&ndplr=1');

$start = '<div id="content">';
$end = '<div id="footer">';

$foo = preg_replace("#$start(.*?)$end#",'$1',$foo);

echo $foo;

UPDATE

I guess another question I have is basically about if it is just simpler and easier to use regex with start and end points rather than trying to parse through a DOM which might have errors and then extract the piece I need. Seems like regex would be the way to go but would love to hear your opinions.

Upvotes: 0

Views: 264

Answers (3)

Michael Low
Michael Low

Reputation: 24506

Try changing your regex to $foo = preg_replace("#$start(.*?)$end#s",'$1',$foo); , the s modifier changes the . to include new lines. As it is, your regex would have to all the content between the tags on the same line to match.

If your HTML page is any more complex than that, then regex probably won't cut it and you'd need to look into a parser like DOMDocument or Simple HTML DOM

Upvotes: 1

DhruvPathak
DhruvPathak

Reputation: 43265

Do not use regex, it can fail. Use PHP's inbuilt DOM parse : http://php.net/manual/en/class.domdocument.php

You can easily traverse and parse relevant content .

Upvotes: 0

Last Rose Studios
Last Rose Studios

Reputation: 2481

if you have a lot to do, I would recommend you take a look at http://simplehtmldom.sourceforge.net really good for this sort of thing.

Upvotes: 0

Related Questions