Andreas Wederbrand
Andreas Wederbrand

Reputation: 40001

Split parent node into two siblings at given node

I have a XML (example below) and I need to split one node into two at a certain child node

<?xml version="1.0" encoding="UTF-8"?>
<gpx>
  <trk>
    <trkseg>
      <trkpt>
        <time>2014-01-16T14:33:35.000Z</time>
      </trkpt>
      <trkpt>
        <time>2014-01-16T14:33:39.000Z</time>
      </trkpt>
      <trkpt>
        <time>2014-01-16T15:44:14.000Z</time>
      </trkpt>
    </trkseg>
  </trk>
</gpx>

This is the resulting XML

<?xml version="1.0" encoding="UTF-8"?>
<gpx>
  <trk>
    <trkseg>
      <trkpt>
        <time>2014-01-16T14:33:35.000Z</time>
      </trkpt>
      <trkpt>
        <time>2014-01-16T14:33:39.000Z</time>
      </trkpt>
    </trkseg>   <-- this line is new
    <trkseg>    <-- this line is new
      <trkpt>
        <time>2014-01-16T15:44:14.000Z</time>
      </trkpt>
    </trkseg>
  </trk>
</gpx>

This XML is somewhat fixed, in reality there is thousands of trkpt's.

I have no problem finding where to do the split using Nokogiri but I have no good idea how to make the split.

Upvotes: 1

Views: 252

Answers (2)

matt
matt

Reputation: 79743

You may find this easier if you think in terms of nodes of the parsed data structure rather than textual XML elements.

In this case you want to add a new trkseg node after the first, then remove the last trkpt node and move it to this new node. Something like this should work:

d = Nokogiri.XML(the_original_xml)

# find the node to move and remove it from its current position
trkpt3 = d.at_xpath("//trkpt[3]")
trkpt3.remove

# create a new node of type trkseg
new_node = d.create_element("trkseg")

# add the trkpt3 node to this new node
new_node.add_child(trkpt3)

# add the new node into position as a child of the trk node
d.at_xpath("//trk").add_child(new_node)

The actual result of this isn’t quite the same as what you’re after, as it doesn’t account for the whitespace nodes, but otherwise the structure is the same – it looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<gpx>
  <trk>
    <trkseg>
      <trkpt>
        <time>2014-01-16T14:33:35.000Z</time>
      </trkpt>
      <trkpt>
        <time>2014-01-16T14:33:39.000Z</time>
      </trkpt>
      
    </trkseg>
  <trkseg><trkpt>
        <time>2014-01-16T15:44:14.000Z</time>
      </trkpt></trkseg></trk>
</gpx>

If it was important you could be more precise with how you reconstruct the document to get exactly the result you need.

You’ll probably need different XPath queries than this in a real situation, but the general idea of manipulating the DOM structure with methods like remove, add_child, <<, create_element, and create_text_node is what you need.


A general purpose method

Here’s an example of a method you can use to split a node into, with the split being after the node passed in as an argument:

def split_after(node)
  # Find all the nodes to be moved to the new node
  to_move = node.xpath("following-sibling::node()")
  # The parent node, this is the node that will be "split"
  p = node.parent

  # Create the new node
  new_node = node.document.create_element(p.name)

  # Remove the nodes from the original position
  # and add them to the new node
  to_move.remove
  new_node << to_move

  # Insert the new node into the correct position
  p.add_next_sibling(new_node)
end

This uses add_next_sibling, which ensures the new node is added in the correct position when the node being split itself has siblings.

Upvotes: 1

Arup Rakshit
Arup Rakshit

Reputation: 118271

I would do as below :

require 'nokogiri'

doc_string = <<-xml
<?xml version="1.0" encoding="UTF-8"?>
<gpx>
  <trk>
    <trkseg>
      <trkpt>
        <time>2014-01-16T14:33:35.000Z</time>
      </trkpt>
      <trkpt>
        <time>2014-01-16T14:33:39.000Z</time>
      </trkpt>
      <trkpt>
        <time>2014-01-16T15:44:14.000Z</time>
      </trkpt>
    </trkseg>
  </trk>
</gpx>
xml

doc = Nokogiri.XML(doc_string) do |config|
  config.default_xml.noblanks
end

# First I find the node, onto which I would split as an example.
split_node = doc.at("//trkpt[last()]")

# I took out the parent node of the node onto which I will split later.
parent_node_of_split_node = split_node.parent

# Now I am removing the splitting node from the main xml document.
split_node.unlink

# Now I am creating a new node of type <trkseg>, into which I will add splitting node
# as a child node.
new_node_to_add = Nokogiri::XML::Node.new('trkseg',doc)

# added the splitting node as a child node to the newly created node <trkseg>
new_node_to_add.add_child split_node

# below line I hope clear by seeing to the local variables names as I have written
new_node_to_add.parent = parent_node_of_split_node.parent

puts doc.to_xml(:indent => 2)

# >> <?xml version="1.0" encoding="UTF-8"?>
# >> <gpx>
# >>   <trk>
# >>     <trkseg>
# >>       <trkpt>
# >>         <time>2014-01-16T14:33:35.000Z</time>
# >>       </trkpt>
# >>       <trkpt>
# >>         <time>2014-01-16T14:33:39.000Z</time>
# >>       </trkpt>
# >>     </trkseg>
# >>     <trkseg>
# >>       <trkpt>
# >>         <time>2014-01-16T15:44:14.000Z</time>
# >>       </trkpt>
# >>     </trkseg>
# >>   </trk>
# >> </gpx>

Methods that I have used here :

Upvotes: 0

Related Questions