Reputation: 313
I would like to web scrap the links that are nested in the name of the property, this script works, however, not retrieves the URLs only NA
s. Could you help me or what I am missing in the script snipped.
Thank you
# Test
library(rvest)
library(dplyr)
link <- "https://www.sreality.cz/hledani/prodej/byty/brno?_escaped_fragment_="
page <- read_html(link)
price <- page %>%
html_elements(".norm-price.ng-binding") %>%
html_text()
name <- page %>%
html_elements(".name.ng-binding") %>%
html_text()
location <- page %>%
html_elements(".locality.ng-binding") %>%
html_text()
href <- page %>%
html_nodes(".name.ng-binding") %>%
html_attr("href") %>% paste("https://www.sreality.cz", ., sep="")
flat <- data.frame(price, name, location, href, stringsAsFactors = FALSE)
Upvotes: 0
Views: 622
Reputation:
Your CSS selector picked the anchors' inline html instead of the anchor. This should work:
page %>%
html_nodes("a.title") %>%
html_attr("ng-href") %>%
paste0("https://www.sreality.cz", .)
paste0(...)
being a shorthand for paste(..., sep = '')
Upvotes: 2
Reputation: 3173
Another way using JS path
page %>%
html_nodes('#page-layout > div.content-cover > div.content-inner > div.transcluded-content.ng-scope > div > div > div.content > div > div:nth-child(4) > div > div:nth-child(n)') %>%
html_nodes('a') %>% html_attr('href') %>% str_subset('detail') %>% unique() %>% paste("https://www.sreality.cz", ., sep="")
[1] "https://www.sreality.cz/detail/prodej/byt/4+1/brno-zabrdovice-tkalcovska/1857071452"
[2] "https://www.sreality.cz/detail/prodej/byt/3+kk/brno--/1336764508"
[3] "https://www.sreality.cz/detail/prodej/byt/2+kk/brno-stary-liskovec-u-posty/3639359836"
[4] "https://www.sreality.cz/detail/prodej/byt/2+1/brno-reckovice-druzstevni/3845994844"
[5] "https://www.sreality.cz/detail/prodej/byt/2+1/brno-styrice-jilova/1102981468"
[6] "https://www.sreality.cz/detail/prodej/byt/1+kk/brno-dolni-herspice-/1961502812"
Upvotes: 0