Reputation: 9335
I have a Rake task set-up, and it works almost how I want it to.
I'm scraping information from a site and want to get all of the player ratings into an array, ordered by how they appear in the HTML. I have player_ratings
and want to do exactly what I did with the player_names
variable.
I only want the fourth <td>
within a <tr>
in the specified part of the doc because that corresponds to the ratings. If I use Nokogiri's text
, I only get the first player rating when I really want an array of all of them.
task :update => :environment do
require "nokogiri"
require "open-uri"
team_ids = [7689, 7679, 7676, 7680]
player_names = []
for team_id in team_ids do
url = URI.encode("http://modules.ussquash.com/ssm/pages/leagues/Team_Information.asp?id=#{team_id}")
doc = Nokogiri::HTML(open(url))
player_names = doc.css('.table.table-bordered.table-striped.table-condensed')[1].css('tr td a').map(&:content)
player_ratings = doc.css('.table.table-bordered.table-striped.table-condensed')[1].css('tr td')[3]
puts player_ratings
player_names.map{|player| puts player}
end
end
Any advice on how to do this?
Upvotes: 0
Views: 139
Reputation: 160551
It's not well known, but Nokogiri implements some of jQuery's JavaScript extensions for searching using CSS selectors. In your case, the :eq(n)
method will be useful:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<html>
<body>
<table>
<tr>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
</tr>
</table>
</body>
</html>
EOT
doc.at('td:eq(4)').text # => "4"
Upvotes: 0
Reputation: 2717
I think changing your xpath
might help. Here is the xpath
nodes = doc.xpath "//table[@class='table table-bordered table-striped table-condensed'][2]//tr/td[4]"
data = nodes.each {|node| node.text }
Iterating the nodes with node.text
gives me
4.682200
5.439000
5.568400
5.133700
4.480800
4.368700
2.768100
3.814300
5.103400
4.567000
5.103900
3.804400
3.737100
4.742400
Upvotes: 1
Reputation: 10740
I'd recommend using Wombat (https://github.com/felipecsl/wombat), where you can specify that you want to retrieve a list of elements matched by your css selector and it will do all the hard work for you
Upvotes: 0