Reputation: 40001
I have a XML (example below) and I need to split one node into two at a certain child node
<?xml version="1.0" encoding="UTF-8"?>
<gpx>
<trk>
<trkseg>
<trkpt>
<time>2014-01-16T14:33:35.000Z</time>
</trkpt>
<trkpt>
<time>2014-01-16T14:33:39.000Z</time>
</trkpt>
<trkpt>
<time>2014-01-16T15:44:14.000Z</time>
</trkpt>
</trkseg>
</trk>
</gpx>
This is the resulting XML
<?xml version="1.0" encoding="UTF-8"?>
<gpx>
<trk>
<trkseg>
<trkpt>
<time>2014-01-16T14:33:35.000Z</time>
</trkpt>
<trkpt>
<time>2014-01-16T14:33:39.000Z</time>
</trkpt>
</trkseg> <-- this line is new
<trkseg> <-- this line is new
<trkpt>
<time>2014-01-16T15:44:14.000Z</time>
</trkpt>
</trkseg>
</trk>
</gpx>
This XML is somewhat fixed, in reality there is thousands of trkpt's.
I have no problem finding where to do the split using Nokogiri but I have no good idea how to make the split.
Upvotes: 1
Views: 252
Reputation: 79743
You may find this easier if you think in terms of nodes of the parsed data structure rather than textual XML elements.
In this case you want to add a new trkseg
node after the first, then remove the last trkpt
node and move it to this new node. Something like this should work:
d = Nokogiri.XML(the_original_xml)
# find the node to move and remove it from its current position
trkpt3 = d.at_xpath("//trkpt[3]")
trkpt3.remove
# create a new node of type trkseg
new_node = d.create_element("trkseg")
# add the trkpt3 node to this new node
new_node.add_child(trkpt3)
# add the new node into position as a child of the trk node
d.at_xpath("//trk").add_child(new_node)
The actual result of this isn’t quite the same as what you’re after, as it doesn’t account for the whitespace nodes, but otherwise the structure is the same – it looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<gpx>
<trk>
<trkseg>
<trkpt>
<time>2014-01-16T14:33:35.000Z</time>
</trkpt>
<trkpt>
<time>2014-01-16T14:33:39.000Z</time>
</trkpt>
</trkseg>
<trkseg><trkpt>
<time>2014-01-16T15:44:14.000Z</time>
</trkpt></trkseg></trk>
</gpx>
If it was important you could be more precise with how you reconstruct the document to get exactly the result you need.
You’ll probably need different XPath queries than this in a real situation, but the general idea of manipulating the DOM structure with methods like remove
, add_child
, <<
, create_element
, and create_text_node
is what you need.
Here’s an example of a method you can use to split a node into, with the split being after the node passed in as an argument:
def split_after(node)
# Find all the nodes to be moved to the new node
to_move = node.xpath("following-sibling::node()")
# The parent node, this is the node that will be "split"
p = node.parent
# Create the new node
new_node = node.document.create_element(p.name)
# Remove the nodes from the original position
# and add them to the new node
to_move.remove
new_node << to_move
# Insert the new node into the correct position
p.add_next_sibling(new_node)
end
This uses add_next_sibling
, which ensures the new node is added in the correct position when the node being split itself has siblings.
Upvotes: 1
Reputation: 118271
I would do as below :
require 'nokogiri'
doc_string = <<-xml
<?xml version="1.0" encoding="UTF-8"?>
<gpx>
<trk>
<trkseg>
<trkpt>
<time>2014-01-16T14:33:35.000Z</time>
</trkpt>
<trkpt>
<time>2014-01-16T14:33:39.000Z</time>
</trkpt>
<trkpt>
<time>2014-01-16T15:44:14.000Z</time>
</trkpt>
</trkseg>
</trk>
</gpx>
xml
doc = Nokogiri.XML(doc_string) do |config|
config.default_xml.noblanks
end
# First I find the node, onto which I would split as an example.
split_node = doc.at("//trkpt[last()]")
# I took out the parent node of the node onto which I will split later.
parent_node_of_split_node = split_node.parent
# Now I am removing the splitting node from the main xml document.
split_node.unlink
# Now I am creating a new node of type <trkseg>, into which I will add splitting node
# as a child node.
new_node_to_add = Nokogiri::XML::Node.new('trkseg',doc)
# added the splitting node as a child node to the newly created node <trkseg>
new_node_to_add.add_child split_node
# below line I hope clear by seeing to the local variables names as I have written
new_node_to_add.parent = parent_node_of_split_node.parent
puts doc.to_xml(:indent => 2)
# >> <?xml version="1.0" encoding="UTF-8"?>
# >> <gpx>
# >> <trk>
# >> <trkseg>
# >> <trkpt>
# >> <time>2014-01-16T14:33:35.000Z</time>
# >> </trkpt>
# >> <trkpt>
# >> <time>2014-01-16T14:33:39.000Z</time>
# >> </trkpt>
# >> </trkseg>
# >> <trkseg>
# >> <trkpt>
# >> <time>2014-01-16T15:44:14.000Z</time>
# >> </trkpt>
# >> </trkseg>
# >> </trk>
# >> </gpx>
Methods that I have used here :
Nokogiri::XML::Node#at
Nokogiri::XML::Node#to_xml
Nokogiri::XML::Node#parent
Nokogiri::XML::Node#unlink
Nokogiri::XML::Node#add_child
Nokogiri::XML::Node#parent=
Upvotes: 0