Reputation: 4008
I'm scrapping this page:
https://www.linio.com.pe/c/tv-y-video/televisores
And I want to extract the current price of the TVs. The problem is that some prices are inside a <div>
and others -fewer- inside a <span>
tag.
I'm wondering if it's possible to use an 'ifelse' construct to get all the current prices for the TVs.
#Reads Linio's HTML
linio <- read_html("https://www.linio.com.pe/c/tv-y-video/televisores", encoding = "ISO-8859-1")
#Extracts prices inside the div tag
linio %>% html_nodes("div.price-section div.price-secondary") %>% html_text()
#Extracts prices inside the span tag
linio %>% html_nodes("div.price-section span.price-secondary") %>% html_text()
I was trying this to combine the prices from the div and the span tags:
linio %>% ifelse(length(html_nodes("div.price-section div.price-secondary") %>% html_text())==0, html_nodes("div.price-section span.price-secondary") %>% html_text(), html_nodes("div.price-section div.price-secondary")) %>% html_text()
Without success... why can't you be consistant Linio's front end developers...!
Upvotes: 0
Views: 390
Reputation: 21507
There are multiple ways to accomplish that:
Drop the div
/span
altogether using:
linio %>% html_nodes("div.price-section .price-secondary") %>% html_text()
This selects all elements with class price-secondary
inside
div.price-section
.
More specific
Only select div
and span
tags inside div.price-section
you can use:
linio %>%
html_nodes("div.price-section div.price-secondary, div.price-section span.price-secondary") %>%
html_text
For a full CSS selector reference see https://www.w3schools.com/cssref/css_selectors.asp
minimal CSS selector
To find a minimal CSS selector have a look at http://selectorgadget.com/
In your case this would be:
linio %>% html_nodes(".price-secondary") %>% html_text
This selects all elements with class price-secondary
Test that all return the same result
res1 <- linio %>% html_nodes("div.price-section .price-secondary") %>% html_text()
res2 <- linio %>%
html_nodes("div.price-section div.price-secondary, div.price-section span.price-secondary") %>%
html_text
res3 <- linio %>% html_nodes(".price-secondary") %>% html_text
all(res1 == res2) # TRUE
all(res2 == res3) # TRUE
Upvotes: 1