Gru
Gru

Reputation: 95

How to scrape using rvest in pages with multiple tables

I'm trying to scrape the data from every table at the hockey-reference awards page. I can scrape the first table for the Hart Memorial Trophy, but when I try the rest of them, I end up with empty vectors. I used Selector Gadget and the rvest package to produce the following code.

library(rvest)
url="https://www.hockey-reference.com/awards/voting-2017.html"
byng<-read_html(url)
byng_node<-html_nodes(byng, "#byng_stats .right , #byng_stats a")
byng_text<-html_text(byng_node)

However, once I run this code, I get no data in the byng variables:

> byng_node
{xml_nodeset (0)}
> byng_text
character(0)

What's happening here? Does selector gadget not work for pages with multiple tables? Does it have nothing to do with that and there's something HTMLy I don't understand? Any help is greatly appreciated!

Upvotes: 2

Views: 520

Answers (1)

Alex Knorre
Alex Knorre

Reputation: 630

@neilfws was right: if you look at the source code of the HTML page, you see that all but the first table are commented so rvest thinks they are comments, not part of source code itself. Let's do a dirty hack and remove these characters that are used to comment our precious tables:

library(rvest)
url="https://www.hockey-reference.com/awards/voting-2017.html"
byng<-read_html(url)
# Remove commenting sequences
byng <- gsub("<!--", "", byng)
byng <-  gsub("-->", "", byng)
byng<-read_html(byng)
#Get tables as a list of dataframes
tables <- html_table(byng)
# Last table
tables[7]
[[1]]
                                                                         Scoring Scoring Scoring Scoring Goalie Stats Goalie Stats
1  Place              Player Age  Tm Pos Votes Vote% 1st 2nd 3rd 4th 5th       G       A     PTS     +/-            W            L
2      1      Connor McDavid  20 EDM   C   762 94.07 141  18   3   0   0      30      70     100      27                          
3      2       Sidney Crosby  29 PIT   C   526 64.94  20 142   0   0   0      44      45      89      17                          
4      3   Nicklas Backstrom  29 WSH   C   127 15.68   1   2 116   0   0      23      63      86      17                          
5      4      Mark Scheifele  23 WPG   C    21  2.59   0   0  21   0   0      32      50      82      18                          
6      5     Auston Matthews  19 TOR   C    10  1.23   0   0  10   0   0      40      29      69       2                          
7      6       Evgeni Malkin  30 PIT   C     4  0.49   0   0   4   0   0      33      39      72      18                          
8      7        John Tavares  26 NYI   C     2  0.25   0   0   2   0   0      28      38      66       4                          
9      8      Jonathan Toews  28 CHI   C     1  0.12   0   0   1   0   0      21      37      58       7                          
10     8       Brad Marchand  28 BOS   C     1  0.12   0   0   1   0   0      39      46      85      18                          
11     8         Ryan Kesler  32 ANA   C     1  0.12   0   0   1   0   0      22      36      58       8                          
12     8        Ryan Getzlaf  31 ANA   C     1  0.12   0   0   1   0   0      15      58      73       7 

Upvotes: 1

Related Questions