Spink
Spink

Reputation: 19

extract nested information from a link using rvest

i tried this code using r vest in order to extract some nested information from a link but it is returning NA in the last variable "links".

library("robotstxt")
library("dplyr")
library("rvest")

url<-"https://www.car.gr/classifieds/cars/?fs=1&condition=used&offer_type=sale&modified=15&st=private"

paths_allowed(domain = "https://www.car.gr/classifieds/cars/?fs=1&condition=used&offer_type=sale&modified=15&st=private" )

page<-read_html(url)

Title<-page %>% html_nodes(".title") %>% html_text()

Price<-page %>% html_nodes(".price-fmt") %>% html_text()

links<-page %>% html_nodes(".title") %>% 
       html_attr("h2") %>% paste0("https://www.car.gr", .)

Upvotes: 0

Views: 60

Answers (1)

Datapumpernickel
Datapumpernickel

Reputation: 696

The class element you are looking for is not .title, but .row-anchor, like such:

links <- page %>% html_nodes(".row-anchor") %>% 
       html_attr("href")

It can be helpful to use the "inspector" in your browser to identify classes. In the same tool (both firefox and chrome) you can fulltext search for keywords. Just type in a sample link and you will easily find the respective tag for your link.

Upvotes: 1

Related Questions