Reputation: 163
I have been trying and failing to write an R function to scrape and parse geojson data of Uber coverage areas from the company's website.
One can see the visual representation of the polygon I am trying to scrape overlayed on the map displayed here: https://www.uber.com/cities/atlanta Looking in the page source in firefox reveals that the geographic coordinates describing the polygon I'm after are found in this node
<script type="text/javascript">
var cityJSON = { ... }
</script>
So, that is the node I have been trying to grab with a script. However, it seems that that node is not making it into R at all. Running
fileURL <- "https://www.uber.com/cities/atlanta"
xData <- getURL(fileURL)
html_parsed <- htmlParse(xData)
print(html_parsed)
returns just about everything from the page source except for the node I'm after! Does this have something to do with rCurl not loading the javascript? Am I approaching this problem all wrong?
(tested using OS X Mavericks)
Upvotes: 2
Views: 814
Reputation: 12640
With httr, stringr and jsonlite packages and the magrittr pipe:
x <- GET(url) %>%
content %>%
as("character") %>%
str_extract("(?<=cityJSON = )\\{.*?\\}(?=;)") %>%
fromJSON
Note the resultant list includes a member 'geojson' which will in turn need processing through fromJSON
.
Upvotes: 4
Reputation: 3194
library(rvest)
k1<-read_html("https://www.uber.com/cities/atlanta")%>%
html_nodes("script")%>%
.[3]%>%
html_text(trim=TRUE)
you need to use regular expression hereafter to format the data
Upvotes: 3