Hayden
Hayden

Reputation: 163

Scraping javascript with RCurl

I have been trying and failing to write an R function to scrape and parse geojson data of Uber coverage areas from the company's website.

One can see the visual representation of the polygon I am trying to scrape overlayed on the map displayed here: https://www.uber.com/cities/atlanta Looking in the page source in firefox reveals that the geographic coordinates describing the polygon I'm after are found in this node

<script type="text/javascript">
var cityJSON = { ... }
</script>

So, that is the node I have been trying to grab with a script. However, it seems that that node is not making it into R at all. Running

fileURL <- "https://www.uber.com/cities/atlanta"
xData <- getURL(fileURL)
html_parsed <- htmlParse(xData)
print(html_parsed)

returns just about everything from the page source except for the node I'm after! Does this have something to do with rCurl not loading the javascript? Am I approaching this problem all wrong?

(tested using OS X Mavericks)

Upvotes: 2

Views: 814

Answers (2)

Nick Kennedy
Nick Kennedy

Reputation: 12640

With httr, stringr and jsonlite packages and the magrittr pipe:

x <- GET(url) %>%
  content %>%
  as("character") %>%
  str_extract("(?<=cityJSON = )\\{.*?\\}(?=;)") %>%
  fromJSON

Note the resultant list includes a member 'geojson' which will in turn need processing through fromJSON.

Upvotes: 4

user227710
user227710

Reputation: 3194

library(rvest)
k1<-read_html("https://www.uber.com/cities/atlanta")%>% 
   html_nodes("script")%>%
   .[3]%>%
  html_text(trim=TRUE)

you need to use regular expression hereafter to format the data

Upvotes: 3

Related Questions