user2257812
user2257812

Reputation:

How can I search for a specific text element?

How can I search for the element containing Click Here to Enter a New Password using Nokigiri::HTML?

My HTML structure is like:

<table border="0" cellpadding="20" cellspacing="0" width="100%">
  <tbody>
  <tr>
    <td class="bodyContent" valign="top">
      <div>
        <strong>Welcome to</strong>
        <h2 style="margin-top:0">OddZ</h2>
        <a href="http://mandrillapp.com/track/click.php?...">Click Here</a>
        to Enter a New Password
        <p>
          Click this link to enter a new Password. This link will expire within 24 hours, so don't delay.
          <br>
        </p>
      </div>
    </td>
  </tr>
  </tbody>
</table>

I tried:

doc = (Nokogiri::HTML(@inbox_emails.first.body.raw_source))

password_container = doc.search "[text()*='Click Here to Enter a New Password']"

but this did not find a result. When I tried:

password_container = doc.search "[text()*='Click Here']"

I got no result.

I want to search the complete text.

I found there are many spaces before text " to Enter a New Password" but I have not added any space in the HTML code.

Upvotes: 0

Views: 151

Answers (4)

Mark Thomas
Mark Thomas

Reputation: 37517

You were close. Here's how you find the text's containing element:

doc.search('*[text()*="Click Here"]')

This gives you the <a> tag. Is this what you want? If you actually want the parent element of the <a>, which is the containing <div>, you can modify it like so:

doc.search('//*[text()="Click Here"]/..').text

This selects the containing <div>, the text of which is:

Welcome to
OddZ
Click Here
to Enter a New Password

Click this link to enter a new Password. This link will expire within 24 hours, so don't delay.

Upvotes: 0

Roland Mai
Roland Mai

Reputation: 31077

You can use a mix of xpath and regex, but since there's no matches in xpath for nokogiri yet, you can implement your own as follows:

class RegexHelper
  def content_matches_regex node_set, regex_string
    ! node_set.select { |node| node.content =~ /#{regex_string}/mi }.empty?
  end

  def content_matches node_set, string
    content_matches_regex node_set, string.gsub(/\s+/, ".*?")
  end
end

search_string = "Click Here to Enter a New Password"

matched_nodes = doc.xpath "//*[content_matches(., '#{search_string}')]", RegexHelper.new

Upvotes: 1

pguardiario
pguardiario

Reputation: 54984

Much of the text you are searching for is outside of the a element.

The best you can do might be:

a = doc.search('a[text()="Click Here"]').find{|a| a.next.text[/to Enter a New Password/]}

Upvotes: 2

Ye Lin Aung
Ye Lin Aung

Reputation: 11459

You can try by using CSS selector. I've saved your HTML as a file called, test.html

require 'Nokogiri'

@doc = Nokogiri::HTML(open('test.html'))

puts @result = @doc.css('p').text.gsub(/\n/,'')

it returns

Click this link to enter a new Password. This link will expire within 24 hours, so don't delay.

There's a good post about Parsing HTML with Nokogiri

Upvotes: 0

Related Questions