Zachary Lassiter
Zachary Lassiter

Reputation: 181

PHP Scrape HTML Between <pre> tags

I'm having trouble with finding out how to scrape HTML content from only inside

 and 
tags with PHP5.

I want to take an example of the following document, and take the 2 (or more pre tag areas, its dynamic) and shove it into an array.

blablabla
<pre>save
this
really</pre>
not this
<pre>save this too
really
</pre>
but not this

how do i shove the area between the pre tags of a html file on another server into an array.

Upvotes: 0

Views: 1446

Answers (3)

hoju
hoju

Reputation: 29452

you could simply use a regular expression to extract all the content within pre tags.

In python that would be:

re.compile('<pre>(.*?)</pre>', re.DOTALL).findall(html)

Upvotes: 0

pguardiario
pguardiario

Reputation: 54984

I recommend using xpath

$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DomXpath($doc);

$pre_tags = array();
foreach($xpath->query('//pre') as $node){
    $pre_tags[] = $node->nodeValue;
}

Upvotes: 2

jli
jli

Reputation: 6623

Assuming the HTML is well formed, you could do something like:

$pos = 0;
$insideTheDiv = array();
while (($pos = strpos($theHtml, "<pre>", $pos)) !== false) {
    $pos += 5;
    $endPrePos = strpos($theHtml, "</pre>", $pos);
    if ($endPrePos !== false) {
        $insideTheDiv[] = substr($theHtml, $pos, $endPrePos - $pos);
    } else break;
}

After it's done, $insideTheDiv should be an array of all the contents of the pre tags.

Demo: http://codepad.viper-7.com/X15l7P (it strips the newlines from the output)

Upvotes: 0

Related Questions