Reputation: 95
I'm trying to scrape the data from every table at the hockey-reference awards page. I can scrape the first table for the Hart Memorial Trophy, but when I try the rest of them, I end up with empty vectors. I used Selector Gadget and the rvest package to produce the following code.
library(rvest)
url="https://www.hockey-reference.com/awards/voting-2017.html"
byng<-read_html(url)
byng_node<-html_nodes(byng, "#byng_stats .right , #byng_stats a")
byng_text<-html_text(byng_node)
However, once I run this code, I get no data in the byng variables:
> byng_node
{xml_nodeset (0)}
> byng_text
character(0)
What's happening here? Does selector gadget not work for pages with multiple tables? Does it have nothing to do with that and there's something HTMLy I don't understand? Any help is greatly appreciated!
Upvotes: 2
Views: 520
Reputation: 630
@neilfws was right: if you look at the source code of the HTML page, you see that all but the first table are commented so rvest
thinks they are comments, not part of source code itself. Let's do a dirty hack and remove these characters that are used to comment our precious tables:
library(rvest)
url="https://www.hockey-reference.com/awards/voting-2017.html"
byng<-read_html(url)
# Remove commenting sequences
byng <- gsub("<!--", "", byng)
byng <- gsub("-->", "", byng)
byng<-read_html(byng)
#Get tables as a list of dataframes
tables <- html_table(byng)
# Last table
tables[7]
[[1]]
Scoring Scoring Scoring Scoring Goalie Stats Goalie Stats
1 Place Player Age Tm Pos Votes Vote% 1st 2nd 3rd 4th 5th G A PTS +/- W L
2 1 Connor McDavid 20 EDM C 762 94.07 141 18 3 0 0 30 70 100 27
3 2 Sidney Crosby 29 PIT C 526 64.94 20 142 0 0 0 44 45 89 17
4 3 Nicklas Backstrom 29 WSH C 127 15.68 1 2 116 0 0 23 63 86 17
5 4 Mark Scheifele 23 WPG C 21 2.59 0 0 21 0 0 32 50 82 18
6 5 Auston Matthews 19 TOR C 10 1.23 0 0 10 0 0 40 29 69 2
7 6 Evgeni Malkin 30 PIT C 4 0.49 0 0 4 0 0 33 39 72 18
8 7 John Tavares 26 NYI C 2 0.25 0 0 2 0 0 28 38 66 4
9 8 Jonathan Toews 28 CHI C 1 0.12 0 0 1 0 0 21 37 58 7
10 8 Brad Marchand 28 BOS C 1 0.12 0 0 1 0 0 39 46 85 18
11 8 Ryan Kesler 32 ANA C 1 0.12 0 0 1 0 0 22 36 58 8
12 8 Ryan Getzlaf 31 ANA C 1 0.12 0 0 1 0 0 15 58 73 7
Upvotes: 1