shanehoban
shanehoban

Reputation: 870

Nokogiri get elements if exist or not

Quite simply can you do a conditional scrape, i.e. I want an <a> 
tag within a parent, and if a <span> is contained within that parent
(so the span is holding the <a>, instead of the parent), I still want
to drill into the span regardless for the <a>

Hopefully this example will provide enough details.

<tr>
    <td>1989</td>
    <td>
      <i>
       <a href="/wiki/Always_(1989_film)" title="Always (1989 film)">Always</a>
     </i>
    </td>
     <td>Pete Sandich</td>
</tr>

I can access the <a> fine using:

all_links = doca.search('//tr//td//i//a[@href]')

But what I want to know is can I also add a conditional, so if there is a span surrounding the <a>, can this be put in the search?

 <tr>
    <td>1989</td>
    <td>
      <i>
       <span>
         <a href="/wiki/Always_(1989_film)" title="Always (1989 film)">Always</a>
       </span>
     </i>
    </td>
     <td>Pete Sandich</td>
</tr>

So is there a way to conditionally grab the <a>, something like so:

all_links = doca.search('//tr//td//i//?span//a[@href]')

Where ?span would be a conditional - i.e. if a span is there, then enter that level, and then enter the link.

And if no span is there then skip it and just enter the link.

Thanks in advance, greatly appreciate any help!

Shane

Upvotes: 0

Views: 314

Answers (1)

Arup Rakshit
Arup Rakshit

Reputation: 118299

Here we go :

require 'nokogiri'

doc = Nokogiri::HTML::Document.parse <<-eot
<tr>
    <td>1989</td>
    <td>
      <i>
       <span>
         <a href='/wiki2/Always_(1989_film)' title='Always (1989 film)'>Always</a>
       </span>
     </i>
    </td>
        <td>
      <i>
         <a href='/wiki1/Always_(1989_film)' title='Always (1989 film)'>Always</a>
     </i>
    </td>
     <td>Pete Sandich</td>
</tr>
eot

# xpath expression will grab a tag if it is wrapped inside the span tag
node = doc.xpath("//tr//i//a[name(./..)='span']")
p node.size # => 1
p node.map{ |n| n['href'] } # => ["/wiki2/Always_(1989_film)"]

Upvotes: 2

Related Questions