Reputation: 6475
I'm getting second table from this page, parse it and trying to generate hashes from this data. The problem is that each object is separated by this grey TR
but I can only manage this by getting every single TR
from this table.
How can I determine proper TR
by getting those between gray ones?
For now I'm using this line to get each TR
:
parsed_html.css("table")[1].css("tr")
EDIT:
I don't know if Hash is a good way for this task but here is JSON for "LIFTING AND SHORING" section (this is sample one and feel free to correct me):
{
"chapter":"07",
"title":"LIFTING AND SHORING",
"description":"This chapter shall...",
"section":[
{
"number":"00",
"title":"GENERAL",
"description":"",
},
{
"number":"10",
"title":"JACKING",
"description":"Provides information relative...",
},
{
"number":"20",
"title":"SHORING",
"description":"Those instructions necessary...",
}
]
}
Upvotes: 0
Views: 103
Reputation: 1332
assuming you're using nokogiri, I'd do something like
#!/usr/local/env ruby
require 'nokogiri'
require 'open-uri'
require 'pp'
doc = Nokogiri::HTML(open('http://www.s-techent.com/ATA100.htm'))
d = doc.css("table")[1] #.css("tr")
array = []
d.css('tr').each do |r|
tds = r.css("td") # extract the td elements from this tr
array << tds.map {|td| td.text.strip }
end
pp array
Upvotes: 1