JohnSmith1976
JohnSmith1976

Reputation: 666

xpath challenge: How to merge multiple results into one result

I use Ruby 1.9.3p385 and I use Nokogiri to parse XML files. Not quite sure which xpath version I use, but it does respond to v.1 syntax/functions, and not v.2 syntax.

I have this XML file:

<root_tag>
  <middle_tag>
    <item_tag>
      <headline_1>
        <tag_1>Product title 1</tag_1>
      </headline_1>
      <headline_2>
        <tag_2>Product attribute 1</tag_2>
      </headline_2>
    </item_tag>
    <item_tag>
      <headline_1>
        <tag_1>Product title 2</tag_1>
      </headline_1>
      <headline_2>
        <tag_2>Product attribute 2</tag_2>
      </headline_2>
    </item_tag>
  </middle_tag>
</root_tag>

I want to extract all the products, and for that I am using this code:

products = xml_file.xpath("/root_tag/middle_tag/item_tag/headline_1|/root_tag/middle_tag/item_tag/headline_2")

puts products.size # => 4

If you look at the output, using:

products.each_with_index do |product, i|
  puts "product #{i}:"
  puts product
end

you get this:

product 0:
<headline_1>
  <tag_1>Product title 1</tag_1>
</headline_1>
product 1:
<headline_2>
  <tag_2>Product attribute 1</tag_2>
</headline_2>
product 2:
<headline_1>
  <tag_1>Product title 2</tag_1>
</headline_1>
product 3:
<headline_2>
  <tag_2>Product attribute 2</tag_2>
</headline_2>

I need my code to join/merge all matches into the same result (so products.size should be 2). The final output should look something like this:

product 0:
<headline_1>
  <tag_1>Product title 1</tag_1>
</headline_1>
<headline_2>
  <tag_2>Product attribute 1</tag_2>
</headline_2>
product 1:
<headline_1>
  <tag_1>Product title 2</tag_1>
</headline_1>
<headline_2>
  <tag_2>Product attribute 2</tag_2>
</headline_2>

I have looked all over the internet, but all variations, e.g.:

products = xml_file.xpath("/root_tag/middle_tag/item_tag/*[self::headline_1|self::headline_2]")

all seems to output the same result.

Am I missing some important point in xpath, or am I overlooking something?

Upvotes: 3

Views: 1146

Answers (1)

Jens Erat
Jens Erat

Reputation: 38722

XPath only knows plain sequences, so there's nothing like subsequences. You will have to wrap each "product" into some XML element. Gladly we've already got such an element (<item_tag/>), so the code is rather simple:

products = doc.xpath("(//item_tag")
products.each_with_index do |product, i|
  puts "product #{i}:"
  product.children.each do |line|
    puts line
  end
end

Output is (probably needs some more formatting, but I'm not used to ruby and can't help you with that):

product 0:

<headline_1>
        <tag_1>Product title 1</tag_1>
      </headline_1>

<headline_2>
        <tag_2>Product attribute 1</tag_2>
      </headline_2>

product 1:

<headline_1>
        <tag_1>Product title 2</tag_1>
      </headline_1>

<headline_2>
        <tag_2>Product attribute 2</tag_2>
      </headline_2>

To address all <headline_n/>-tags, you can also use //*[starts-with(local-name(), 'headline')] to make the code more flexible.

Upvotes: 3

Related Questions