Reputation: 2074
I'm working in R trying to scrape some stats for multiple players from www.baseball-reference.com
. I've been able to scrape other elements from specific pages on the site, but have run into problems scraping from a particular table that appears on all the players' stats pages. The table id is 'batting_value' and the node caption that appears on the page as the table header is 'Player Value--Batting'.
Here's an example page:
https://www.baseball-reference.com/players/b/brownro02.shtml
I'm interested in scraping the 'PA' value from the bottom row of the 'Player Value--Batting' table.
I've tried inspect > copy xpath
, which gets me the xpath
in the case of the above example url.
//*[@id="batting_value"]/tfoot/tr/td[3]
But when I try to scrape using that path...
library(dplyr)
library(rvest)
xpath <- '//*[@id="batting_value"]/tfoot/tr/td[3]'
tables <- read_html(url)
pa <- tables %>%
html_node(xpath = xpath) %>%
html_text()
pa
[1] NA
It looks like the API isn't even finding the node:
tables %>%
html_node(xpath = xpath)
{xml_missing}
<NA>
Why isn't this node being found by html_node
, and how would I go about scraping this value from the Player Value--Batting table?
Upvotes: 0
Views: 579
Reputation: 419
It's inside the comment, right ?
url ='https://www.baseball-reference.com/players/b/brownro02.shtml'
library(rvest)
tab = read_html(url) %>%
html_nodes(xpath = '//*[@id="all_batting_value"]//comment()') %>%
html_text() %>% read_html() %>%
html_table() %>% as.data.frame()
tab
Year Age Tm Lg G PA Rbat Rbaser Rdp Rfield Rpos RAA WAA Rrep RAR WAR waaWL. X162WL. oWAR dWAR oRAR Salary Pos
1 1999 23 CHC NL 33 70 -4 0 0 -3 0 -8 -0.8 2 -5 -0.5 0.478 0.495 -0.3 -0.3 -3 7/89
2 2000 24 CHC NL 45 98 4 0 0 0 -1 3 0.3 3 6 0.6 0.507 0.502 0.6 -0.2 7 $210,000 7/98
3 2001 25 CHC NL 39 92 2 0 0 0 -1 0 0.0 3 3 0.3 0.500 0.500 0.3 -0.2 3 $230,000 7/D98
4 2002 26 CHC NL 111 231 -11 -1 0 -3 -2 -16 -1.7 7 -9 -1.0 0.485 0.490 -0.7 -0.6 -6 $255,000 78/9D
5 4 Seasons 4 Seasons 4 Seasons 228 491 -9 -1 0 -6 -4 -21 -2.2 15 -5 -0.8 0.491 0.495 -0.1 -1.2 1 $695,000
Awards
1 NA
2 NA
3 NA
4 NA
5 NA
Upvotes: 2