Reputation: 19
i tried this code using r vest in order to extract some nested information from a link but it is returning NA in the last variable "links".
library("robotstxt")
library("dplyr")
library("rvest")
url<-"https://www.car.gr/classifieds/cars/?fs=1&condition=used&offer_type=sale&modified=15&st=private"
paths_allowed(domain = "https://www.car.gr/classifieds/cars/?fs=1&condition=used&offer_type=sale&modified=15&st=private" )
page<-read_html(url)
Title<-page %>% html_nodes(".title") %>% html_text()
Price<-page %>% html_nodes(".price-fmt") %>% html_text()
links<-page %>% html_nodes(".title") %>%
html_attr("h2") %>% paste0("https://www.car.gr", .)
Upvotes: 0
Views: 60
Reputation: 696
The class element you are looking for is not .title, but .row-anchor, like such:
links <- page %>% html_nodes(".row-anchor") %>%
html_attr("href")
It can be helpful to use the "inspector" in your browser to identify classes. In the same tool (both firefox and chrome) you can fulltext search for keywords. Just type in a sample link and you will easily find the respective tag for your link.
Upvotes: 1