dylankb
dylankb

Reputation: 1200

How to find an element's text in Capybara while ignoring inner element text

In the HTML example below I am trying to grab the $16.95 text in the outer span.price element and exclude the text from the inner span.sale one.

<div class="price">
  <span class="sale">
    <span class="sale-text">"Low price!"</span>
    "$16.95"
  </span>
</div>

If I was using Nokogiri this wouldn't be too difficult.

price = doc.css('sale')
price.search('.sale-text').remove
price.text

However Capybara navigates rather than removes nodes. I knew something like price.text would grab text from all sub elements, so I tried to use xpath to be more specific. p.find(:xpath, "//span[@class='sale']", :match => :first).text. However this grabs text from the inner element as well.

Finally, I tried looping through all spans to see if I could separate the results but I get an Ambiguous error.

p.find(:css, 'span').each { |result| puts result.text }
Capybara::Ambiguous: Ambiguous match, found 2 elements matching css "span"

I am using Capybara/Selenium as this is for a web scraping project with authentication complications.

Upvotes: 0

Views: 4407

Answers (1)

Thomas Walpole
Thomas Walpole

Reputation: 49890

There is no single statement way to do this with Capybara since the DOMs concept of innerText doesn't really support what you want to do. Assuming p is the '.price' element, two ways you could get what you want are as follows:

  1. Since you know the node you want to ignore just subtract that text from the whole text

    p.find('span.sale').text.sub(p.find('span.sale-text').text, '')
    
  2. Grab the innerHTML string and parse that with Nokogiri or Capybara.string (which just wraps Nokogiri elements in the Capybara DSL)

    doc = Capybara.string(p['innerHTML'])
    nokogiri_fragment = doc.native
    #do whatever you want with the nokogiri fragment
    

Upvotes: 2

Related Questions