Reputation: 99
I'm trying to get the data table from this webpage: http://rotoguru1.com/cgi-bin/fyday.pl?week=1&game=dk&scsv=1
This is the XPath of the data i want to extract: /html/body/table/tbody/tr/td[3]/pre.
i have tried:
url <- "http://rotoguru1.com/cgi-bin/fyday.pl?week=1&game=dk&scsv=1"
DFS_table <- read_html(url) %>%
html_nodes(xpath = '/html/body/table/tbody/tr/td[3]/pre') %>%
html_table()
DFS_table<- DFS_table[[1]]
But get this error: Error in DFS_table[[1]] : subscript out of bounds.
When i try:
url <- "http://rotoguru1.com/cgi-bin/fyday.pl?week=1&game=dk&scsv=1"
pg <- read_html(URL)
tab <- html_table(pg, fill=TRUE)[[1]]
It seems to get all the data displayed on the web page, so i am thinking my problem is maybe related to the fact the whole page is a table and i need to extract a part of that, but am unsure how to do it.
Any help is appreciated.
Upvotes: 1
Views: 865
Reputation: 33772
I assume that you want the data under the line "Semi-colon delimited format:". This is preformatted text, so trying to extract a table node won't work.
You can get that data straight into a data frame like this. Note the quote =
argument to read.table
, required because some player names contain single quotes.
url <- "http://rotoguru1.com/cgi-bin/fyday.pl?week=1&game=dk&scsv=1"
mydata <- read_html(u) %>%
html_node("pre") %>%
html_text() %>%
read.table(text = ., sep = ";", header = TRUE, quote = "")
head(mydata)
Week Year GID Name Pos Team h.a Oppt DK.points DK.salary
1 1 2021 1523 Mahomes II, Patrick QB kan h cle 36.28 8100
2 1 2021 1537 Murray, Kyler QB ari a ten 34.56 7600
3 1 2021 1490 Goff, Jared QB det h sfo 32.92 5100
4 1 2021 1131 Brady, Tom QB tam h dal 32.16 6700
5 1 2021 1501 Prescott, Dak QB dal a tam 31.42 6200
6 1 2021 1465 Winston, Jameis QB nor h gnb 29.62 5200
Upvotes: 1
Reputation: 3604
html_table()
tries to read a proper formated <table>[...]</table>
, and looks like your data of interest is a preformated text.
library(rvest)
url <- "http://rotoguru1.com/cgi-bin/fyday.pl?week=1&game=dk&scsv=1"
html <- read_html(url)
text <- html_text(html_nodes(html, xpath = './/td[3]/pre'))
library(stringi)
str_split(text, "\n")
Which gives you:
[1] "Week;Year;GID;Name;Pos;Team;h/a;Oppt;DK points;DK salary"
[2] "1;2021;1523;Mahomes II, Patrick;QB;kan;h;cle;36.28;8100"
[3] "1;2021;1537;Murray, Kyler;QB;ari;a;ten;34.56;7600"
[4] "1;2021;1490;Goff, Jared;QB;det;h;sfo;32.92;5100"
[5] "1;2021;1131;Brady, Tom;QB;tam;h;dal;32.16;6700"
[6] "1;2021;1501;Prescott, Dak;QB;dal;a;tam;31.42;6200"
Upvotes: 2