rosen_
rosen_

Reputation: 248

Node of Dom Parser to get title from a html page

this the html page:

<div class="gs_ri">
   <h3 class="gs_rt">
     <span class="gs_ctc">
     <span class="gs_ct1">[BOOK]</span>
     <span class="gs_ct2">[B]</span></span>
     <a href="http://example.com" onmousedown="">Title</a></h3>
<div class="gs_a">A</div>
<div class="gs_rs">B</div>
<div class="gs_fl"><a href="">C</a> <a href="">D</a> <a href=""</a></div></div></div>  
<div class="gs_r"><div class="gs_ggs gs_fl"><button type="button" id="gs_ggsB2" class="gs_btnFI gs_in_ib gs_btn_half">
     <span class="gs_wr"><span class="gs_bg"></span>
     <span class="gs_lbl"></span>
     <span class="gs_ico"></span></span></button>
<div class="gs_md_wp" id="gs_ggsW2"><a href="http://example.pdf" onmousedown=""

I'm a little confused to determine the node.

I wanna get http://example.com and Title

I thought there are 2 ways to get them:

it's a sibling of the <span>:

 foreach($html->find('span[class=gs_ctc2] ') as $link){
    $link = $link->next_sibling();
    echo $link->plaintext;
    echo $link->href;
}

but it does not work.

the second, I take <h3 class="gs_rt"> as parent, so it's the sibling of last child

foreach($html->find('h3[class=gs_rt] a') as $link){
    $link = $link->last_child()->next_sibling();
    echo $link->plaintext;
    echo $link->href;
}

it also does not work. I think that I am not understanding yet abot node dom tree.

Upvotes: 0

Views: 493

Answers (2)

Sirko
Sirko

Reputation: 74036

You do not have to select a sibling.

With h3[class=gs_rt] a you are already targeting the respective <a> tag. So just extract the desired values from there. You can, however, simplify that selector as follows:

foreach($html->find('h3.gs_rt a') as $link){
    echo $link->plaintext;
    echo $link->href;
}

EDIT

With regard to the comment, I think, what you want is something like this, but I'm not sure and your code above is quite a mess (please use proper indention!)

foreach($html->find('h3.gs_rt') as $block){
    $link = $block->find( 'a' );
    echo $link->plaintext;
    echo $link->href;

    $otherLink = $block->find( 'div[class=gs_md_wp] a' );
    // do stuff with that $otherLink
}

Upvotes: 1

Mohamed Abo Elenen
Mohamed Abo Elenen

Reputation: 54

add id to the href

<a id="myid" href="http://example.com" onmousedown="javascript:get_title('#myid')">Title</a></h3>

function get_title(i){
var h =$(i).attr('href');  
var t =$(i).text(); 
 alert('the link is (' + h + ' ) and the title is (' + t + ' )');
        }

Upvotes: 0

Related Questions