Flex
Flex

Reputation: 11

extract all parents from a doc with Nokogiri

I have a document like that:

<DL><a lot of tags>...<H3>Entry 1</H3><a lot of tags>...</DL>
<DL><a lot of tags>...<H3>Entry 2</H3><a lot of tags>...
    <DL><a lot of tags>...<H3>Entry 21</H3><a lot of tags>...
        <DL><a lot of tags>...<H3>Entry 211</H3><a lot of tags>...</DL>
    </DL>
</DL>
<DL><a lot of tags>...><H3>Entry 3</H3><a lot of tags>...</DL>

I want to find all ''entry and it's easy with the follow code:

@doc=Nokogiri::HTML(@file)
@doc.css('DL>h3').each do |node| puts node.text end

how can I extract a list of H3 parents for any entries ? I'd like to have a method as 'parent' that returns the relationship, i.e.: entry211.parent ==> /Entry 2/Entry 21/

Upvotes: 1

Views: 509

Answers (1)

Jakob S
Jakob S

Reputation: 20145

If you simply want the parent element of each h3 element

@doc.css('DL>h3').collect(&:parent)

should do the trick.

However, it looks like you might want all h3 elements that are children of a dl element that is an ancestor of a h3 element. If I've understood that and your structure correctly you should be able to do

@doc.css('dl>h3').collect { |h3| h3.ancestors('dl').css('h3') }

This gives you an Array containing an Array with the h3 elements that are descendants of the dl elements in each h3 elements ancestry. Confused? I sure am :)

For example, using your sample HTML the result for the Entry 211 h3 is

@doc.css('dl>h3').collect { |h3| h3.ancestors('dl').css('h3') }[3].collect(&:text)
#=> ["Entry 211", "Entry 21", "Entry 2"]

Is this close enough to what you want?

Upvotes: 1

Related Questions