Reputation: 171
I have a dataset of 464 Toronto addresses. The addresses look like this:
raw_data = as.data.frame(c("570 BLOOR ST W TORONTO ON M6G1K1", "10 STAYNER AVE NORTH YORK ON M6B1N4", "1200 WOODBINE AVE EAST YORK ON M4C4E3", "2480-2490 GERRARD STREET EAST UNIT 20A TORONTO ON M1N 4C3")) %>% setNames("address")
address
1 570 BLOOR ST W TORONTO ON M6G1K1
2 10 STAYNER AVE NORTH YORK ON M6B1N4
3 1200 WOODBINE AVE EAST YORK ON M4C4E3
4 2480-2490 GERRARD STREET EAST UNIT 20A TORONTO ON M1N 4C3
I want to add a variable that says the ward of the city that each address is a part of. The city website has an application that allows you to check what ward each address is in. Thus, I could enter each of the 464 addresses manually and record the ward. However, I'm wondering if there's a way to automate this task in R. I'd really appreciate any input!
For reference, the desired output for the addresses I listed would be:
cleaned_data = as.data.frame(
cbind(c("570 BLOOR ST W TORONTO ON M6G1K1", "10 STAYNER AVE NORTH YORK ON M6B1N4", "1200 WOODBINE AVE EAST YORK ON M4C4E3", "2480-2490 GERRARD STREET EAST UNIT 20A TORONTO ON M1N 4C3"),
c("University-Rosedale", "Eglinton-Lawrence", "Beaches-East York", "Scarborough"))
) %>% setNames(c("address", "ward"))
address ward
1 570 BLOOR ST W TORONTO ON M6G1K1 University-Rosedale
2 10 STAYNER AVE NORTH YORK ON M6B1N4 Eglinton-Lawrence
3 1200 WOODBINE AVE EAST YORK ON M4C4E3 Beaches-East York
4 2480-2490 GERRARD STREET EAST UNIT 20A TORONTO ON M1N 4C3 Scarborough
One extra challenge here is that some of the addresses in my dataset don't correspond to a unique address on the city website (e.g. row 4 of my example data). Having an automated solution to this would be great, but if it's too challenging, I should be able to do the few that are like this manually in a reasonable amount of time.
Upvotes: 0
Views: 94
Reputation: 6583
A solution without RSelenium
. By the way, the last address that you provided does not exist according to the website.
require(tidyverse)
require(httr2)
df <- tibble(
address = c(
"570 BLOOR ST W TORONTO ON M6G1K1",
"10 STAYNER AVE NORTH YORK ON M6B1N4",
"1200 WOODBINE AVE EAST YORK ON M4C4E3",
"2480-2490 GERRARD STREET EAST UNIT 20A TORONTO ON M1N 4C3"
)
)
get_ward <- function(query) {
response <- paste0("https://map.toronto.ca/geoservices/rest/search/rankedsearch?searchArea=1&matchType=1&projectionType=1&retRowLimit=10&areaTypeCode1=CITW&areaTypeCode2=WD03&searchString=",
query) %>%
str_replace_all(" ", "%20") %>%
request() %>%
req_perform() %>%
resp_body_json(simplifyVector = T) %>%
.$result %>%
.$bestResult %>%
.$detail %>%
str_extract("(?<=[:]).*") %>%
str_squish()
ifelse(length(response) == 0,
return(NULL),
return(response))
}
df %>%
mutate(ward = map(address, get_ward) %>%
as.character())
# A tibble: 4 x 2
address ward
<chr> <chr>
1 570 BLOOR ST W TORONTO ON M6G1K1 University-Rosedale (11)
2 10 STAYNER AVE NORTH YORK ON M6B1N4 Eglinton-Lawrence (8)
3 1200 WOODBINE AVE EAST YORK ON M4C4E3 Beaches-East York (19)
4 2480-2490 GERRARD STREET EAST UNIT 20A TORONTO ON M1N 4C3 NULL
Upvotes: 1
Reputation: 137
Yes of course there is a way to do that using RSelenium.
It should look like this.
library(RSelenium)
library(tidyverse)
# Création du Driver
remDr0 <- rsDriver(browser = "firefox", port=4089L)
remDr <- remDr0$client
# Ouvrir ou fermer le navigateur
remDr$open()
remDr$close()
remDr$open()
# Ouvrir une page web
url <- "https://www.toronto.ca/city-government/data-research-maps/neighbourhoods-communities/ward-profiles/"
remDr$navigate(url)
wardlooker <- function(adresse){
Recherche <- remDr$findElement('css selector', '#js_input__address')
Recherche$sendKeysToElement(list(adresse))
frames <- remDr$findElements("css selector", '.btn-lg')
frames[[1]]$clickElement()
art <- remDr$findElements('css selector', 'here the css of where the result should pop up that I could not find')
ward <- unlist(lapply(art, function(x){x$getElementText()}))
}
And then you can apply this function to all your adresses thanks to map.
Another way to do it, would be using QGIS and maps of the yard.
Upvotes: 0