Yellow_truffle
Yellow_truffle

Reputation: 923

Problem in scraping specific part of a website

I am looking at this website https://www.bcassessment.ca//Property/Info/QTAwMDAwMVYyUA== and am looking for the most up to date price listed on the website. I want to extract this price from the website body using rvest package. To do this I had a look at the html code for the website: enter image description here

Following the instruction I saw for rvest package, I used the code shown below:

library(rvest)
a <- read_html('https://www.bcassessment.ca//Property/Info/QTAwMDAwMVYyUA==')
b <- a %>% html_nodes('div class="total-value"') %>% 
  html_text()
b

However, this will result in an error: Error in parse_simple_selector(stream) : Expected selector, got <DELIM '=' at 10>. I also tried this code:

library(rvest)
a <- read_html('https://www.bcassessment.ca//Property/Info/QTAwMDAwMVYyUA==')
b <- a %>% html_nodes("span") %>% 
  html_text()
b

However, this gave me more than 50 results in which I can find the total price. How can I specifically choose the total price?

Upvotes: 1

Views: 116

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388797

You can target the tag where the value is stored.

library(rvest)

url <- 'https://www.bcassessment.ca//Property/Info/QTAwMDAwMVYyUA=='
url %>%
  read_html %>%
  html_nodes('span#lblTotalAssessedValue') %>%
  html_text()

#[1] "$380,900"

You can use readr::parse_number() to change the above value to numeric.

Upvotes: 1

Related Questions