Reputation: 923
I am looking at this website https://www.bcassessment.ca//Property/Info/QTAwMDAwMVYyUA== and am looking for the most up to date price listed on the website. I want to extract this price from the website body using rvest
package. To do this I had a look at the html code for the website:
Following the instruction I saw for rvest
package, I used the code shown below:
library(rvest)
a <- read_html('https://www.bcassessment.ca//Property/Info/QTAwMDAwMVYyUA==')
b <- a %>% html_nodes('div class="total-value"') %>%
html_text()
b
However, this will result in an error: Error in parse_simple_selector(stream) : Expected selector, got <DELIM '=' at 10>
. I also tried this code:
library(rvest)
a <- read_html('https://www.bcassessment.ca//Property/Info/QTAwMDAwMVYyUA==')
b <- a %>% html_nodes("span") %>%
html_text()
b
However, this gave me more than 50 results in which I can find the total price. How can I specifically choose the total price?
Upvotes: 1
Views: 116
Reputation: 388797
You can target the tag where the value is stored.
library(rvest)
url <- 'https://www.bcassessment.ca//Property/Info/QTAwMDAwMVYyUA=='
url %>%
read_html %>%
html_nodes('span#lblTotalAssessedValue') %>%
html_text()
#[1] "$380,900"
You can use readr::parse_number()
to change the above value to numeric.
Upvotes: 1