Leehbi
Leehbi

Reputation: 779

CSS Selector issue with RVEST

I'm trying to select products from Wine.com using Rvest

library(rvest)
wine <- read_html("http://www.wine.com/v6/90-Rated-Under-20/wine/list.aspx?N=7155+2407")

p <-wine %>%
  html_nodes(".listProductName") %>%
  html_text()

I've used CSS Selector to see that .listProductName selects the product node. When I run this code nothing is returned. Any ideas?

Upvotes: 0

Views: 347

Answers (1)

GGamba
GGamba

Reputation: 13680

I suspect the problem lies in the fact the nodes you are searching for resides inside a aspx form, so they are generated only for a browser, not for curl request.
I see two solution to this:


Using rvest

If you are only interested in the product name or something simple you can use:

p <-wine %>%
  html_nodes('.prodItemInfo_link') %>%
  html_text()

I got that selector analysing the result of wine %>% html_nodes('a') looking for a wine name, and found that the products where inside that class.


Using RSelenium

Provided that you have Selenium installed and running we can simulate a normal browser request:

remDr <- remoteDriver(remoteServerAddr = "localhost" 
                      , port = 4444L
                      , browserName = "htmlunit"
)
remDr$open()
url <- "http://www.wine.com/v6/90-Rated-Under-20/wine/list.aspx?N=7155+2407"
remDr$navigate(url)

This page should be identical to what we see using a normal browser so we can select the nodes with their class:

webElems <- remDr$findElements(using = 'class', 'listProductName')

and then extract the text content for each node:

wines <- sapply(webElems, function(x) x$getElementText())

Hope this helps

Upvotes: 3

Related Questions