Tammboy
Tammboy

Reputation: 331

R Scraping html webpage using XML

I am trying to scrape this webpage using the following code.

library(XML)
url <- html("http://www.gallop.co.za/")
doc <- htmlParse(url)
lat <- xpathSApply(doc,path="//p[@id=Racecards]",fun = xmlGetAttr , name = 'Racecards')

I looked at the webpage and the table i want to scrape is the racecard table, primarily to get the links to where the racecard data is.

I used selector gadget which returns the xml path as:

//*[(@id = "Racecards")]

However, when i use the R code, it returns a zero list. It feels like i'm getting the xml path wrong somehow, what is the correct way to return the table but also return the links within the table?

Upvotes: 0

Views: 598

Answers (1)

Dongdong Kong
Dongdong Kong

Reputation: 416

It seems that the data are transported through json and use js to insert into html. So you can't get the data from html. You can get it directly from json.

library(RCurl)
library(jsonlite)

p <- getURL("http://www.gallop.co.za/cache/horses.json")
fromJSON(p)

Upvotes: 1

Related Questions