Reputation: 57
I am trying to scrape the following: 13.486 Kč from: https://www.aofis.cz/informace-pro-klienty/elba-opf/
For some reason, the following code does not seem to find the number. I am rather a newbie to this so perhaps it is because the string in xml_find_all is wrong. Can anyone please have a look why?
library(xml)
library(xml2)
page <- "https://www.aofis.cz/informace-pro-klienty/elba-opf/"
read_page <- read_html(page)
Price <- read_page %>%
rvest::html_nodes('page-content') %>%
xml2::xml_find_all("//strong[contains(@class 'sg_selected')]") %>%
rvest::html_text()
Price
Thank you!! Michael
Upvotes: 0
Views: 55
Reputation: 174526
The html code you see in your browser developer panel (or selector gadget) is not the same as the content that is being delivered to your R session. It is actually a javascript file which then builds the web page. This is why your rvest
call isn't finding the correct html node: there are no html nodes in the string you are processing!
There are a few different ways to get the information you want, but perhaps the best is to just get the monetary values from the javascript code using regex:
page <- "https://www.aofis.cz/informace-pro-klienty/elba-opf/"
read_page <- httr::content(httr::GET(page), "text")
stringr::str_extract_all(read_page, "\\d+\\.\\d+ K")[[1]][1]
#> [1] "13.486 K"
Upvotes: 1