Omar Gonzales
Omar Gonzales

Reputation: 4008

R: rvest: How to use ifelse for nodes?

I'm scrapping this page:

https://www.linio.com.pe/c/tv-y-video/televisores

And I want to extract the current price of the TVs. The problem is that some prices are inside a <div> and others -fewer- inside a <span> tag.

I'm wondering if it's possible to use an 'ifelse' construct to get all the current prices for the TVs.

#Reads Linio's HTML

linio <- read_html("https://www.linio.com.pe/c/tv-y-video/televisores", encoding = "ISO-8859-1")


#Extracts prices inside the div tag


linio %>% html_nodes("div.price-section div.price-secondary") %>% html_text()


#Extracts prices inside the span tag

linio %>% html_nodes("div.price-section span.price-secondary") %>% html_text()

I was trying this to combine the prices from the div and the span tags:

linio %>% ifelse(length(html_nodes("div.price-section div.price-secondary") %>% html_text())==0, html_nodes("div.price-section span.price-secondary") %>% html_text(), html_nodes("div.price-section div.price-secondary")) %>% html_text()

Without success... why can't you be consistant Linio's front end developers...!

Upvotes: 0

Views: 390

Answers (1)

Rentrop
Rentrop

Reputation: 21507

There are multiple ways to accomplish that:

Drop the div/span altogether using:

linio %>% html_nodes("div.price-section .price-secondary") %>% html_text()

This selects all elements with class price-secondary inside div.price-section.

More specific
Only select div and span tags inside div.price-section you can use:

linio %>% 
  html_nodes("div.price-section div.price-secondary, div.price-section span.price-secondary") %>% 
  html_text

For a full CSS selector reference see https://www.w3schools.com/cssref/css_selectors.asp

minimal CSS selector
To find a minimal CSS selector have a look at http://selectorgadget.com/

In your case this would be:

   linio %>% html_nodes(".price-secondary") %>% html_text

This selects all elements with class price-secondary

Test that all return the same result

res1 <- linio %>% html_nodes("div.price-section .price-secondary") %>% html_text()
res2 <- linio %>% 
  html_nodes("div.price-section div.price-secondary, div.price-section span.price-secondary") %>% 
  html_text
res3 <- linio %>% html_nodes(".price-secondary") %>% html_text
all(res1 == res2) # TRUE 
all(res2 == res3) # TRUE

Upvotes: 1

Related Questions