leecarvallo
leecarvallo

Reputation: 171

How can I automate searching strings on a website search tool and record the data in R?

I have a dataset of 464 Toronto addresses. The addresses look like this:

raw_data = as.data.frame(c("570 BLOOR ST W TORONTO ON M6G1K1", "10 STAYNER AVE NORTH YORK ON M6B1N4", "1200 WOODBINE AVE EAST YORK ON M4C4E3", "2480-2490 GERRARD STREET EAST UNIT 20A TORONTO ON M1N 4C3")) %>% setNames("address")

                                                  address
1                          570 BLOOR ST W TORONTO ON M6G1K1
2                       10 STAYNER AVE NORTH YORK ON M6B1N4
3                     1200 WOODBINE AVE EAST YORK ON M4C4E3
4 2480-2490 GERRARD STREET EAST UNIT 20A TORONTO ON M1N 4C3

I want to add a variable that says the ward of the city that each address is a part of. The city website has an application that allows you to check what ward each address is in. Thus, I could enter each of the 464 addresses manually and record the ward. However, I'm wondering if there's a way to automate this task in R. I'd really appreciate any input!

For reference, the desired output for the addresses I listed would be:

cleaned_data = as.data.frame(
  cbind(c("570 BLOOR ST W TORONTO ON M6G1K1", "10 STAYNER AVE NORTH YORK ON M6B1N4", "1200 WOODBINE AVE EAST YORK ON M4C4E3", "2480-2490 GERRARD STREET EAST UNIT 20A TORONTO ON M1N 4C3"),
        c("University-Rosedale", "Eglinton-Lawrence", "Beaches-East York", "Scarborough"))
) %>% setNames(c("address", "ward"))


                                                    address                ward
1                          570 BLOOR ST W TORONTO ON M6G1K1 University-Rosedale
2                       10 STAYNER AVE NORTH YORK ON M6B1N4   Eglinton-Lawrence
3                     1200 WOODBINE AVE EAST YORK ON M4C4E3   Beaches-East York
4 2480-2490 GERRARD STREET EAST UNIT 20A TORONTO ON M1N 4C3         Scarborough

One extra challenge here is that some of the addresses in my dataset don't correspond to a unique address on the city website (e.g. row 4 of my example data). Having an automated solution to this would be great, but if it's too challenging, I should be able to do the few that are like this manually in a reasonable amount of time.

Upvotes: 0

Views: 94

Answers (2)

HoelR
HoelR

Reputation: 6583

A solution without RSelenium. By the way, the last address that you provided does not exist according to the website.

require(tidyverse)
require(httr2)

df <- tibble(
  address = c(
    "570 BLOOR ST W TORONTO ON M6G1K1",
    "10 STAYNER AVE NORTH YORK ON M6B1N4",
    "1200 WOODBINE AVE EAST YORK ON M4C4E3",
    "2480-2490 GERRARD STREET EAST UNIT 20A TORONTO ON M1N 4C3"
  )
)

get_ward <- function(query) { 
  response <- paste0("https://map.toronto.ca/geoservices/rest/search/rankedsearch?searchArea=1&matchType=1&projectionType=1&retRowLimit=10&areaTypeCode1=CITW&areaTypeCode2=WD03&searchString=",
         query) %>% 
    str_replace_all(" ", "%20") %>% 
    request() %>%
    req_perform() %>%
    resp_body_json(simplifyVector = T) %>%
    .$result %>%
    .$bestResult %>%
    .$detail %>%  
    str_extract("(?<=[:]).*") %>%  
    str_squish() 
  
  ifelse(length(response) == 0, 
         return(NULL), 
         return(response))
  
}

df %>%  
  mutate(ward = map(address, get_ward) %>%  
           as.character()) 

# A tibble: 4 x 2
  address                                                   ward                    
  <chr>                                                     <chr>                   
1 570 BLOOR ST W TORONTO ON M6G1K1                          University-Rosedale (11)
2 10 STAYNER AVE NORTH YORK ON M6B1N4                       Eglinton-Lawrence (8)   
3 1200 WOODBINE AVE EAST YORK ON M4C4E3                     Beaches-East York (19)  
4 2480-2490 GERRARD STREET EAST UNIT 20A TORONTO ON M1N 4C3 NULL          

Upvotes: 1

L&#233;o Henry
L&#233;o Henry

Reputation: 137

Yes of course there is a way to do that using RSelenium.

It should look like this.

library(RSelenium)
library(tidyverse)

# Création du Driver
remDr0 <- rsDriver(browser = "firefox", port=4089L)
remDr <- remDr0$client

# Ouvrir ou fermer le navigateur
remDr$open()
remDr$close()
remDr$open()

# Ouvrir une page web
url <- "https://www.toronto.ca/city-government/data-research-maps/neighbourhoods-communities/ward-profiles/"
remDr$navigate(url)


wardlooker <- function(adresse){
  Recherche <- remDr$findElement('css selector', '#js_input__address')
  Recherche$sendKeysToElement(list(adresse))

  frames <- remDr$findElements("css selector", '.btn-lg')
  frames[[1]]$clickElement()

  art <- remDr$findElements('css selector', 'here the css of where the result should pop up that I could not find')
  ward <- unlist(lapply(art, function(x){x$getElementText()}))
}

And then you can apply this function to all your adresses thanks to map.

Another way to do it, would be using QGIS and maps of the yard.

Upvotes: 0

Related Questions