Reputation: 666
I use Ruby 1.9.3p385 and I use Nokogiri to parse XML files. Not quite sure which xpath version I use, but it does respond to v.1 syntax/functions, and not v.2 syntax.
I have this XML file:
<root_tag>
<middle_tag>
<item_tag>
<headline_1>
<tag_1>Product title 1</tag_1>
</headline_1>
<headline_2>
<tag_2>Product attribute 1</tag_2>
</headline_2>
</item_tag>
<item_tag>
<headline_1>
<tag_1>Product title 2</tag_1>
</headline_1>
<headline_2>
<tag_2>Product attribute 2</tag_2>
</headline_2>
</item_tag>
</middle_tag>
</root_tag>
I want to extract all the products, and for that I am using this code:
products = xml_file.xpath("/root_tag/middle_tag/item_tag/headline_1|/root_tag/middle_tag/item_tag/headline_2")
puts products.size # => 4
If you look at the output, using:
products.each_with_index do |product, i|
puts "product #{i}:"
puts product
end
you get this:
product 0:
<headline_1>
<tag_1>Product title 1</tag_1>
</headline_1>
product 1:
<headline_2>
<tag_2>Product attribute 1</tag_2>
</headline_2>
product 2:
<headline_1>
<tag_1>Product title 2</tag_1>
</headline_1>
product 3:
<headline_2>
<tag_2>Product attribute 2</tag_2>
</headline_2>
I need my code to join/merge all matches into the same result (so products.size should be 2). The final output should look something like this:
product 0:
<headline_1>
<tag_1>Product title 1</tag_1>
</headline_1>
<headline_2>
<tag_2>Product attribute 1</tag_2>
</headline_2>
product 1:
<headline_1>
<tag_1>Product title 2</tag_1>
</headline_1>
<headline_2>
<tag_2>Product attribute 2</tag_2>
</headline_2>
I have looked all over the internet, but all variations, e.g.:
products = xml_file.xpath("/root_tag/middle_tag/item_tag/*[self::headline_1|self::headline_2]")
all seems to output the same result.
Am I missing some important point in xpath, or am I overlooking something?
Upvotes: 3
Views: 1146
Reputation: 38722
XPath only knows plain sequences, so there's nothing like subsequences. You will have to wrap each "product" into some XML element. Gladly we've already got such an element (<item_tag/>
), so the code is rather simple:
products = doc.xpath("(//item_tag")
products.each_with_index do |product, i|
puts "product #{i}:"
product.children.each do |line|
puts line
end
end
Output is (probably needs some more formatting, but I'm not used to ruby and can't help you with that):
product 0:
<headline_1>
<tag_1>Product title 1</tag_1>
</headline_1>
<headline_2>
<tag_2>Product attribute 1</tag_2>
</headline_2>
product 1:
<headline_1>
<tag_1>Product title 2</tag_1>
</headline_1>
<headline_2>
<tag_2>Product attribute 2</tag_2>
</headline_2>
To address all <headline_n/>
-tags, you can also use //*[starts-with(local-name(), 'headline')]
to make the code more flexible.
Upvotes: 3