PiperWarrior
PiperWarrior

Reputation: 191

Accessing Nokogiri element children

Upon parsing an html table, I am able to get the first row of the table as a Nokogiri element.

2.2.1 :041 > pp content[1]; nil
#(Element:0x3feee917d1e0 {
  name = "tr",
  children = [
    #(Element:0x3feee917cfd8 {
      name = "td",
      attributes = [
        #(Attr:0x3feee917cf74 { name = "valign", value = "top" })],
      children = [
        #(Element:0x3feee917ca60 {
          name = "a",
          attributes = [
            #(Attr:0x3feee917c9fc {
              name = "href",
              value = "/cgi-bin/own-disp?action=getowner&CIK=0001513362"
              })],
          children = [ #(Text "Maestri Luca")]
          })]
      }),
    #(Text "\n"),
    #(Element:0x3feee917c150 {
      name = "td",
      children = [
        #(Element:0x3feee917d794 {
          name = "a",
          attributes = [
            #(Attr:0x3feee9179fb8 {
              name = "href",
              value = "/cgi-bin/browse-edgar?action=getcompany&CIK=0001513362"
              })],
          children = [ #(Text "0001513362")]
          })]
      }),
    #(Text "\n"),
    #(Element:0x3feee91796a8 {
      name = "td",
      children = [ #(Text "2016-09-04")]
      }),
    #(Text "\n"),
    #(Element:0x3feee9179194 {
      name = "td",
      children = [ #(Text "officer: Senior Vice President, CFO")]
      }),
    #(Text "\n")]
  })
 => nil 

This is the content from the row:

Maestri Luca 0001513362 2016-09-04 officer: Senior Vice President, CFO

I need to access the Name, Number, Date and Title from the Nokogiri element.

One way of doing it is as below:

2.2.1 :042 > pp content[1].text; nil
"Maestri Luca\n0001513362\n2016-09-04\nofficer: Senior Vice President, CFO\n"

However, I am looking for a way of accessing the elements individually, not as one long sting with newline characters. How can I do it?

Upvotes: 0

Views: 605

Answers (1)

Amadan
Amadan

Reputation: 198324

name, number, date, title = *content[1].css('td').map(&:text)

if content[1] is a tr, content[1].css('td') will find all td elements beneath it, .map(&:text) will call td.text for each of those td and put it into an array, which we than splat with * so we can do multiple assignment.

(Note: next time, please include the original HTML fragment, not the Nokogiri node inspect result.)

Upvotes: 1

Related Questions