Mick Lasocki
Mick Lasocki

Reputation: 139

How to search for some XML data and repleace it with a new value using Nokogiri Ruby gem

Base on below XML exemple file employees.xml and using Ruby Nokogiri gem I wan to open this file, change the building number to 320 and the room number to 99 for Sandra Defoe and save the changes. What is the recommended way to do it.

<?xml version="1.0" encoding="utf-16"?>
<employees>
    <employee id="be129">
        <firstname>Jane</firstname>
        <lastname>Doe</lastname>
        <building>327</building>
        <room>19</room>
    </employee>
    <employee id="be130">
        <firstname>William</firstname>
        <lastname>Defoe</lastname>
        <building>326</building>
        <room>14a</room>
    </employee>
    <employee id="be132">
        <firstname>Sandra</firstname>
        <lastname>Defoe</lastname>
        <building>327</building>
        <room>22</room>
    </employee>
    <employee id="be133">
        <firstname>Steve</firstname>
        <lastname>Casey</lastname>
        <building>327</building>
        <room>24</room>
    </employee>
</employees>

Upvotes: 0

Views: 136

Answers (2)

the Tin Man
the Tin Man

Reputation: 160551

I'd use this:

require 'nokogiri'

doc = Nokogiri::XML(<<EOT)
<?xml version="1.0" encoding="utf-16"?>
<employees>
    <employee id="be130">
        <firstname>William</firstname>
        <lastname>Defoe</lastname>
        <building>326</building>
        <room>14a</room>
    </employee>
    <employee id="be132">
        <firstname>Sandra</firstname>
        <lastname>Defoe</lastname>
        <building>327</building>
        <room>22</room>
    </employee>
</employees>
EOT

first_name = 'Sandra'
last_name = 'Defoe'
node = doc.at("//employee[firstname/text()='%s' and lastname/text()='%s']" % [first_name, last_name])
node.at('building').content = '320'
node.at('room').content = '99'

Which results in:

doc.to_xml
# => "\uFEFF<?xml version=\"1.0\" encoding=\"utf-16\"?>\n" +
#    "<employees>\n" +
#    "    <employee id=\"be130\">\n" +
#    "        <firstname>William</firstname>\n" +
#    "        <lastname>Defoe</lastname>\n" +
#    "        <building>326</building>\n" +
#    "        <room>14a</room>\n" +
#    "    </employee>\n" +
#    "    <employee id=\"be132\">\n" +
#    "        <firstname>Sandra</firstname>\n" +
#    "        <lastname>Defoe</lastname>\n" +
#    "        <building>320</building>\n" +
#    "        <room>99</room>\n" +
#    "    </employee>\n" +
#    "</employees>\n"

Normally I recommend using CSS selectors because they tend to result in less visual noise, however CSS doesn't let us peek into the text of nodes, and working around that, while possible, results in even more noise. XPath, on the other hand, can be very noisy, but for this sort of task, it's more usable.

XPath is very well documented and figuring out what this is doing should be pretty easy.

The Ruby side of it is using a "format string":

"//employee[firstname/text()='%s' and lastname/text()='%s']" % [first_name, last_name])

similar to

"%s %s" % [first_name, last_name] # => "Sandra Defoe"
"//employee[firstname/text()='%s' and lastname/text()='%s']" % [first_name, last_name] 
# => "//employee[firstname/text()='Sandra' and lastname/text()='Defoe']"

Just for thoroughness, here's what I'd do if I wanted to use CSS exclusively:

node = doc.search('employee').find { |node| 
  node.at('firstname').text == first_name && node.at('lastname').text == last_name
}

This gets ugly though, because search tells Nokogiri to retrieve all employee nodes from libXML, then Ruby has to walk through them all telling Nokogiri to tell libXML to look in the child firstname and lastname nodes and return their text. That's slow, especially if there are many employee nodes and the one you want is at the bottom of the file.

The XPath selector tells Nokogiri to pass the search to libXML which parses it, finds the employee node with the child nodes containing the first and last names and returns only that node. It's much faster.

Note that at('employee') is equivalent to search('employee').first.

   # File 'lib/nokogiri/xml/searchable.rb', line 70

   def at(*args)
     search(*args).first
   end

Finally, mediate on the difference between a NodeSet#text and Node#text as the first will lead to insanity.

Upvotes: 2

lacostenycoder
lacostenycoder

Reputation: 11216

Assume your content is a string:

xml=%q(
<?xml version="1.0" encoding="utf-16"?>
<employees>
    <employee id="be129">
        <firstname>Jane</firstname>
        <lastname>Doe</lastname>
        <building>327</building>
        <room>19</room>
    </employee>
    <employee id="be130">
        <firstname>William</firstname>
        <lastname>Defoe</lastname>
        <building>326</building>
        <room>14a</room>
    </employee>
    <employee id="be132">
        <firstname>Sandra</firstname>
        <lastname>Defoe</lastname>
        <building>327</building>
        <room>22</room>
    </employee>
    <employee id="be133">
        <firstname>Steve</firstname>
        <lastname>Casey</lastname>
        <building>327</building>
        <room>24</room>
    </employee>
</employees>)

doc = Nokogiri.parse(xml)

This will work but assumes the first and last names are unique, otherwise it will modify the first match of first and last name.

target = doc.css('employee').find do |node|
  node.search('firstname').text == 'Sandra' &&
  node.search('lastname').text == 'Defoe'
end

target.at_css('building').content = '320'
target.at_css('room').content = '99'

doc # outputs the updated xml
=> <?xml version="1.0"?>
<?xml version="1.0" encoding="utf-16"?>
<employees>
    <employee id="be129">
        <firstname>Jane</firstname>
        <lastname>Doe</lastname>
        <building>327</building>
        <room>19</room>
    </employee>
    <employee id="be130">
        <firstname>William</firstname>
        <lastname>Defoe</lastname>
        <building>326</building>
        <room>14a</room>
    </employee>
    <employee id="be132">
        <firstname>Sandra</firstname>
        <lastname>Defoe</lastname>
        <building>320</building>
        <room>99</room>
    </employee>
    <employee id="be133">
        <firstname>Steve</firstname>
        <lastname>Casey</lastname>
        <building>327</building>
        <room>24</room>
    </employee>
</employees>

Upvotes: 1

Related Questions