JohnSmith1976
JohnSmith1976

Reputation: 686

How to retrieve only elements that are not nested?

I am trying to parse some XML content, in this case with some products:

<PRODUCTS>
  <PRODUCT>
    <NAME><![CDATA[Some name]]></NAME>
    <CATEGORIES>
      <CATEGORY>
        <NAME><![CDATA[Category 1]]></NAME>
      </CATEGORY>
      <CATEGORY>
        <NAME><![CDATA[Category 2]]></NAME>
      </CATEGORY>
    </CATEGORIES>
  </PRODUCT>
  <PRODUCT>
    <NAME><![CDATA[Some other name]]></NAME>
    <CATEGORIES>
      <CATEGORY>
        <NAME><![CDATA[Category 1]]></NAME>
      </CATEGORY>
      <CATEGORY>
        <NAME><![CDATA[Category 2]]></NAME>
      </CATEGORY>
    </CATEGORIES>
  </PRODUCT>
</PRODUCTS>

If I put the above into a doc variable and call for the NAME in each product:

doc.css("PRODUCT").each do |product|
  puts product.css("NAME").size # => 3
end

I also get the nested NAME elements of each product.

How do I get only the NAME that is not nested? I know that product.at_css("NAME") returns only the first element, but my question is not how to get the first element, but rather how to get elements that are not nested.

Upvotes: 2

Views: 77

Answers (3)

Michael Kohl
Michael Kohl

Reputation: 66867

You can use > to select only NAME elements that are direct children of PRODUCT:

doc.css("PRODUCT").each do |product|
  puts product.css("> NAME")
end

This will output the following:

<NAME><![CDATA[Some name]]></NAME>
<NAME><![CDATA[Some other name]]></NAME>

Upvotes: 2

Bartosz Pietraszko
Bartosz Pietraszko

Reputation: 1407

Using XPath:

doc.xpath("PRODUCTS/PRODUCT").each do |product| 
  puts product.xpath("NAME").first
end

.xpath("NAME") in this case returns only immediate descendants. Same effect can be achieved with css child selector.

doc.css("PRODUCT").each do |product| 
  puts product.css("> NAME").first
end

Upvotes: 0

Makushimaru
Makushimaru

Reputation: 111

You can use the following

doc.css("PRODUCT").each do |product|
   puts product.css("NAME").first
end

Upvotes: 0

Related Questions