cojoj
cojoj

Reputation: 6475

Creating hashes from parsed HTML

I'm getting second table from this page, parse it and trying to generate hashes from this data. The problem is that each object is separated by this grey TR but I can only manage this by getting every single TR from this table.

How can I determine proper TR by getting those between gray ones?

For now I'm using this line to get each TR:

parsed_html.css("table")[1].css("tr")


EDIT:
I don't know if Hash is a good way for this task but here is JSON for "LIFTING AND SHORING" section (this is sample one and feel free to correct me):

{
  "chapter":"07",
  "title":"LIFTING AND SHORING",
  "description":"This chapter shall...",
  "section":[
    {
      "number":"00",
      "title":"GENERAL",
      "description":"",
    },

    {
      "number":"10",
      "title":"JACKING",
      "description":"Provides information relative...",
    },

    {
      "number":"20",
      "title":"SHORING",
      "description":"Those instructions necessary...",
    }
  ]
}

Upvotes: 0

Views: 103

Answers (1)

Anko
Anko

Reputation: 1332

assuming you're using nokogiri, I'd do something like

#!/usr/local/env ruby

require 'nokogiri'
require 'open-uri'
require 'pp'

doc = Nokogiri::HTML(open('http://www.s-techent.com/ATA100.htm'))

d = doc.css("table")[1] #.css("tr")

array = []

d.css('tr').each do |r|

    tds = r.css("td") # extract the td elements from this tr

    array << tds.map {|td| td.text.strip }
end

pp array

Upvotes: 1

Related Questions