zish
zish

Reputation: 627

PHP Simple Html Dom extract multiple tags from one class

I am a newbie to use simple html dom with php and I am struggling to extract multiple html tags from one class. I have multiple blocks of html like this in a single page

    <div class="file-right"> 
         <a href="/secrets-of-the-millionaire-mind-tomocubcom-e17682584.html" class="ai-similar" data-id="17682584" data-loc="3">
           <h2><b>Secrets</b> of the <b>Millionaire</b> <b>Mind</b> - TOMOCUB.COM</h2>
         </a>
           <span class="fi-pagecount">223 Pages</span>
           <span class="fi-year">2005</span>
           <span class="fi-size hidemobile">1015 KB</span>
         </div>
     2 - <b>Secrets</b> of the <b>Millionaire</b> <b>Mind</b> and your achievement of <b>success</b>. As you’ve probably fo&nbsp;...
   </div> 

and from each block this html I want to extract

  1. href link
  2. the plain text in tags
  3. each of the 3 span's element text

I have been doing it in php but getting errors again and again. This is the code what i have uptill now

$html = @str_get_html($response);
$allblocks=$html->find('div.file-right'); //this selects all file-right blocks
if(isset($allblocks)){
   foreach($allblocks as $singleblock){
      echo $singleblock->plaintext; // but i get an error here PHP Notice:  Array to string conversion

   }
}

Can anyone help me please.

Upvotes: 2

Views: 928

Answers (1)

Nigel Ren
Nigel Ren

Reputation: 57121

You need to build up the various layers of taking the HTML apart, you started by finding the <div> tag. You can from that find the <a> tag within this <div> and then get the href attribute (using ->href). This code assumes that there is only one <a> tag, so rather than a foreach I just say use the first one (using [0]).

The <span> tags is a similar process, but as there are repeated elements, this time it uses a foreach. This code outputs the class attribute and the contents of the span.

$html = str_get_html($response);
$allblocks=$html->find('div.file-right'); //this selects all file-right blocks
if ( count($allblocks) > 0 ){
    foreach ( $allblocks as $block )    {
        $anchor = $block->find("a");
        echo "href=".$anchor[0]->href.PHP_EOL;
        echo "text=".$anchor[0]->plaintext.PHP_EOL;
        $spans = $block->find("span");
        foreach ( $spans as $span ) {
            echo "span=".$span->class."=".$span->plaintext.PHP_EOL;
        }
    }
}

Note that when in your original code you used isset($allblocks), as the line before set it's value - even if it didn't find anything it will still have a value. In this code I use count() to check if anything is returned by the previous call to find().

With your sample HTML, wrapped only in a minumum page, the output is...

href=/secrets-of-the-millionaire-mind-tomocubcom-e17682584.html
text=            Secrets of the Millionaire Mind - TOMOCUB.COM          
span=fi-pagecount=223 Pages 
span=fi-year=2005 
span=fi-size hidemobile=1015 KB 

Upvotes: 1

Related Questions