Reputation: 779
I'm trying to select products from Wine.com using Rvest
library(rvest)
wine <- read_html("http://www.wine.com/v6/90-Rated-Under-20/wine/list.aspx?N=7155+2407")
p <-wine %>%
html_nodes(".listProductName") %>%
html_text()
I've used CSS Selector to see that .listProductName selects the product node. When I run this code nothing is returned. Any ideas?
Upvotes: 0
Views: 347
Reputation: 13680
I suspect the problem lies in the fact the nodes you are searching for resides inside a aspx form
, so they are generated only for a browser, not for curl request.
I see two solution to this:
If you are only interested in the product name or something simple you can use:
p <-wine %>%
html_nodes('.prodItemInfo_link') %>%
html_text()
I got that selector analysing the result of wine %>% html_nodes('a')
looking for a wine name, and found that the products where inside that class.
Provided that you have Selenium installed and running we can simulate a normal browser request:
remDr <- remoteDriver(remoteServerAddr = "localhost"
, port = 4444L
, browserName = "htmlunit"
)
remDr$open()
url <- "http://www.wine.com/v6/90-Rated-Under-20/wine/list.aspx?N=7155+2407"
remDr$navigate(url)
This page should be identical to what we see using a normal browser so we can select the nodes with their class:
webElems <- remDr$findElements(using = 'class', 'listProductName')
and then extract the text content for each node:
wines <- sapply(webElems, function(x) x$getElementText())
Hope this helps
Upvotes: 3