anon_swe
anon_swe

Reputation: 9335

How to create an array scraping HTML?

I have a Rake task set-up, and it works almost how I want it to.

I'm scraping information from a site and want to get all of the player ratings into an array, ordered by how they appear in the HTML. I have player_ratings and want to do exactly what I did with the player_names variable.

I only want the fourth <td> within a <tr> in the specified part of the doc because that corresponds to the ratings. If I use Nokogiri's text, I only get the first player rating when I really want an array of all of them.

task :update => :environment do
  require "nokogiri"
  require "open-uri"

  team_ids = [7689, 7679, 7676, 7680]
  player_names = []

  for team_id in team_ids do
    url = URI.encode("http://modules.ussquash.com/ssm/pages/leagues/Team_Information.asp?id=#{team_id}")
        doc = Nokogiri::HTML(open(url))
        player_names = doc.css('.table.table-bordered.table-striped.table-condensed')[1].css('tr td a').map(&:content)
        player_ratings = doc.css('.table.table-bordered.table-striped.table-condensed')[1].css('tr td')[3]
        puts player_ratings       
      player_names.map{|player| puts player}
    end

end

Any advice on how to do this?

Upvotes: 0

Views: 139

Answers (3)

the Tin Man
the Tin Man

Reputation: 160551

It's not well known, but Nokogiri implements some of jQuery's JavaScript extensions for searching using CSS selectors. In your case, the :eq(n) method will be useful:

require 'nokogiri'

doc = Nokogiri::XML(<<EOT)
<html>
<body>
  <table>
    <tr>
      <td>1</td>
      <td>2</td>
      <td>3</td>
      <td>4</td>
    </tr>
  </table>
</body>
</html>
EOT

doc.at('td:eq(4)').text # => "4"

Upvotes: 0

bsd
bsd

Reputation: 2717

I think changing your xpath might help. Here is the xpath

nodes = doc.xpath "//table[@class='table table-bordered table-striped table-condensed'][2]//tr/td[4]"

data = nodes.each {|node| node.text }

Iterating the nodes with node.text gives me

4.682200 
5.439000 
5.568400 
5.133700 
4.480800 
4.368700 
2.768100 
3.814300 
5.103400 
4.567000 
5.103900 
3.804400 
3.737100 
4.742400 

Upvotes: 1

Felipe Lima
Felipe Lima

Reputation: 10740

I'd recommend using Wombat (https://github.com/felipecsl/wombat), where you can specify that you want to retrieve a list of elements matched by your css selector and it will do all the hard work for you

Upvotes: 0

Related Questions