Reputation: 71
I have a ton of google map URL's and would like to obtain a clean address from the URL's for geocoding. I recently found getURL() in the RCurl package, which gets me a ton of information
library(RCurl)
getURL("https://maps.google.com/?q=loc%3A+%32%34%34%30+Seattle%2C+%39%38%31%31%36+WA+US")
but all I'm really interested in is getting the address snippet located towards the front end of the getURL() output:
...< meta content=\"loc: 2440 Seattle, 98116 WA US - Google Maps\" property=\"og:title\">...
Update: I just realized the above URL address is a bad example, here's a different example:
getURL("https://maps.google.com/?q=loc%3A+%31%30%30%35%36+Interlake+Ave+N+seattle+WA+US")
...< meta content=\"loc: 10056 Interlake Ave N seattle WA US - Google Maps\" property=\"og:title\">...
Does anyone have suggestions on how to efficiently go about this? My appologies, I'm an intermediate with R and would appreciate your help. Thanks!!
Tim
Upvotes: 4
Views: 934
Reputation: 21497
Use the Google Maps XML-API as follows:
require(XML)
burl <- "http://maps.google.com/maps/api/geocode/xml?address="
address <- "2440 Seattle, 98116 WA US"
request <- paste0(burl,URLencode(address))
doc <- htmlTreeParse(request, useInternalNodes=TRUE)
# Interpreted Adress
xmlValue(doc[["//formatted_address"]])
[1] "2440, Seattle-Tacoma International Airport (SEA), Seattle, WA 98158, USA"
EDIT
If you only have the encoded URL use URLdecode
to decode it instead of downloading the URL:
URL <- "https://maps.google.com/?q=loc%3A+%32%34%34%30+Seattle%2C+%39%38%31%31%36+WA+US"
URL <- gsub(".*loc","",URL) # Get rid of https://...
URL <- URLdecode(URL)
gsub("[:]|[+]", " ", URL) # Get rid of ":" and "+"
[1] " 2440 Seattle, 98116 WA US"
Upvotes: 3