zorian15
zorian15

Reputation: 55

How to scrape web data in R that requires clicking a link?

So I'm new to web scraping, and wanted to learn by trying to scrape the keurig website for fun, and extracting information about some of the k cups for sale. My goal is to go to the k-cups page, click on every k-cup and extract some information such as if it is caffeinated, the roast color, and maybe origin. I can tackle that stuff later, I'm having some trouble finding the CSS or finding a way to automate the process of clicking every object to get the extra info. I did this:

library(rvest)
keurig <- read_html("http://www.keurig.com/beverages/k-cup-pods")
# Grab the CSS Nodes from the website
keurig.html <- html_nodes(keurig, ".keurig_card")
keurig.text <- html_text(keurig.html)
# Print the text
keurig.text

I ended up getting a lot of tab and new line characters with some of the coffee names in between. How exactly would I scrape this data to grab the info about every k-cup?

Upvotes: 0

Views: 974

Answers (1)

R. Schifini
R. Schifini

Reputation: 9313

Use this to get the links for every item:

library(rvest)
keurig <- read_html("http://www.keurig.com/beverages/k-cup-pods")
keurig.html <- html_nodes(keurig, ".product_name")
links = html_attr(keurig.html, name = "href")

The class that contains the links to every item is product_name. Once you get the nodes, extract the href property.

Result (first four shown):

 [1] "/Beverages/Coffee/Regular/Breakfast-Blend-Coffee/p/Breakfast-Blend-Coffee-K-Cup-Green-Mountain"                          
 [2] "/Beverages/Coffee/Regular/Dark-Magic%C2%AE-Extra-Bold-Coffee/p/Dark-Magic-Extra-Bold-Coffee-K-Cup-Green-Mountain"        
 [3] "/Beverages/Coffee/Regular/The-Original-Donut-Shop%C2%AE-Coffee/p/Original-Donut-Shop-Extra-Bold-Coffee-K-Cup-CP"         
 [4] "/Beverages/Coffee/Regular/Nantucket-Blend%C2%AE-Coffee/p/Nantucket-Blend-Coffee-K-Cup-Green-Mountain"

Then use paste0 to create the link to each cake's details page:

paste0("http://www.keurig.com/beverages/k-cup-pods", 
       "/Beverages/Coffee/Regular/Breakfast-Blend-Coffee/p/Breakfast-Blend-Coffee-K-Cup-Green-Mountain")

Upvotes: 1

Related Questions