Jaroslav Kotrba
Jaroslav Kotrba

Reputation: 313

Web scraping of nested links with R

I would like to web scrap the links that are nested in the name of the property, this script works, however, not retrieves the URLs only NAs. Could you help me or what I am missing in the script snipped.

Thank you

# Test
library(rvest)
library(dplyr)

link <- "https://www.sreality.cz/hledani/prodej/byty/brno?_escaped_fragment_="
page <- read_html(link)

price <- page %>% 
  html_elements(".norm-price.ng-binding") %>% 
  html_text()

name <- page %>% 
  html_elements(".name.ng-binding") %>% 
  html_text()

location <- page %>% 
  html_elements(".locality.ng-binding") %>% 
  html_text()

href <- page %>% 
  html_nodes(".name.ng-binding") %>% 
  html_attr("href") %>% paste("https://www.sreality.cz", ., sep="")

flat <- data.frame(price, name, location, href, stringsAsFactors = FALSE)

Upvotes: 0

Views: 622

Answers (2)

user18309711
user18309711

Reputation:

Your CSS selector picked the anchors' inline html instead of the anchor. This should work:

 page %>% 
     html_nodes("a.title") %>%
     html_attr("ng-href") %>% 
     paste0("https://www.sreality.cz", .)

paste0(...) being a shorthand for paste(..., sep = '')

Upvotes: 2

Nad Pat
Nad Pat

Reputation: 3173

Another way using JS path

page %>% 
  html_nodes('#page-layout > div.content-cover > div.content-inner > div.transcluded-content.ng-scope > div > div > div.content > div > div:nth-child(4) > div > div:nth-child(n)') %>% 
  html_nodes('a') %>% html_attr('href') %>% str_subset('detail') %>% unique() %>% paste("https://www.sreality.cz", ., sep="")

[1] "https://www.sreality.cz/detail/prodej/byt/4+1/brno-zabrdovice-tkalcovska/1857071452"          
 [2] "https://www.sreality.cz/detail/prodej/byt/3+kk/brno--/1336764508"                             
 [3] "https://www.sreality.cz/detail/prodej/byt/2+kk/brno-stary-liskovec-u-posty/3639359836"        
 [4] "https://www.sreality.cz/detail/prodej/byt/2+1/brno-reckovice-druzstevni/3845994844"           
 [5] "https://www.sreality.cz/detail/prodej/byt/2+1/brno-styrice-jilova/1102981468"                 
 [6] "https://www.sreality.cz/detail/prodej/byt/1+kk/brno-dolni-herspice-/1961502812"

Upvotes: 0

Related Questions