rubyist
rubyist

Reputation: 3132

Find elements with names matching a pattern

I am trying to get some nodes from the the below xml.

<SalesStart Value="1412899200">10.10.2014</SalesStart>
<SalesEnd Value="4102358400">31.12.2099</SalesEnd>
<Price Value="4.9900">4,99</Price>
<SalesStartEst Value="1411516800">24.09.2014</SalesStartEst>
<SalesEndEst Value="1697500800">17.10.2023</SalesEndEst>

I can access nodes like doc.text_at('SalesStart'). Is it possible to access nodes with regular expression something like

doc.text_at('Sales'[/Start/]) or doc.css('Sales'[/Start/])

so that i can get 2 nodes**(SalesStart and SalesStartEst)** in a single query??

Upvotes: 2

Views: 937

Answers (1)

Phrogz
Phrogz

Reputation: 303490

You cannot use a generic regular expression in Nokogiri itself—since it leans on libxml2 which only supports XPath 1.0—but in your case you just want elements whose name starts with SalesStart. That is possible in XPath 1.0 using the starts-with() function:

# Find all elements, ensuring the correct prefix on the name
doc.xpath("//*[starts-with(name(),'SalesStart')]")

Demo:

require 'nokogiri'
doc = Nokogiri.XML '
  <r>
    <SalesStart Value="1412899200">10.10.2014</SalesStart>
    <SalesEnd Value="4102358400">31.12.2099</SalesEnd>
    <Price Value="4.9900">4,99</Price>
    <SalesStartEst Value="1411516800">24.09.2014</SalesStartEst>
    <SalesEndEst Value="1697500800">17.10.2023</SalesEndEst>
  </r>
'

starts = doc.xpath("//*[starts-with(name(),'SalesStart')]").map(&:text)
p starts #=> ["10.10.2014", "24.09.2014"]

However, if you did need a regular expression, then you can over-find the elements using Nokogiri and then use Ruby to pare down the set. For example:

# memory-heavy approach; pulls all elements and then pares them down
starts = doc.xpath('//*').select{ |e| e.name =~ /^SalesStart/ }

# lightweight approach, accessing one node at a time
starts = []
doc.traverse do |node|
  starts<<node if node.element? && node.name =~ /^SalesStart/
end
p starts.map(&:text) #=> ["10.10.2014", "24.09.2014"]

You can even wrap this up as a convenience method:

# monkeypatching time!
class Nokogiri::XML::Node
  def elements_with_name_matching( regex )
    [].tap{ |result| traverse{ |n| result<<n if n.element? && n.name=~regex } }
  end
end

p doc.elements_with_name_matching( /^SalesStart/ ).map(&:text)
#=> ["10.10.2014", "24.09.2014"]

Upvotes: 1

Related Questions