rookie
rookie

Reputation: 63

Not able to entirely scrape HTML table using R

I've used the following R script:

url="http://stats.espncricinfo.com/ci/engine/player/253802.html?class=3;orderby=default;template=results;type=batting"
check=readHTMLTable(url,header = T)
check$"Career summary"
check<-check$"Career summary"

I'm only able to scrape first 11 observations.

Can anyone suggest why i'm unable to scrape entire table?

Upvotes: 0

Views: 603

Answers (2)

GGamba
GGamba

Reputation: 13680

AS @Wietze314 said there are more than one table on that page. You can get a list of all the table I suppose you are interested in with:

url="http://stats.espncricinfo.com/ci/engine/player/253802.html?class=3;
orderby=default;template=results;type=batting"

check=htmlParse(url)    

tableNodes <- getNodeSet(check, '//tbody')
tbList <- lapply(tableNodes, readHTMLTable)

tbList contains 22 data.frames for you to work with

Upvotes: 0

Wietze314
Wietze314

Reputation: 6020

To get the content of all tables on the page:

library(XML)

url="http://stats.espncricinfo.com/ci/engine/player/253802.html?class=3;orderby=default;template=results;type=batting"

content <- htmlParse(url)

tbody <- xpathSApply(content, "//tbody")

lapply(tbody, function(x) readHTMLTable(x, header=T))

Upvotes: 1

Related Questions