Conor Neilson
Conor Neilson

Reputation: 1091

rvest scrape links from webpage

I'm using rvest to scrape some links from the magazine 'The Hustle'. I've used this code

library(rvest)

page <- read_html("https://thehustle.co/daily/page/33/") %>% 
  html_nodes(".daily-article-title") %>% 
  html_attr('href')

However this returns a vector of 30 NAs. I used SelectorGadget to find the class, so not sure what is going wrong here.

Upvotes: 0

Views: 1097

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389235

The links are present above the class '.daily-article-title'. Here is a way to get title and the corresponding links.

library(rvest)

webpage <- read_html("https://thehustle.co/daily/page/33/")

webpage %>%
  html_nodes("h3.daily-article-title") %>% 
  html_text() -> title

title

# [1] "\nApple buys itself a $400m Christmas present\n"          
# [2] "\nSan Francisco wages war on robots\n"                    
# [3] "\nThe US could lose its greatest export\n"                
# [4] "\n\"Mom, where do podcasts come from?\"\n"                
# [5] "\nTencent Music to team up with Spotify?\n"               
# [6] "\nFirst rule of the Farmers Business Network?\n"          
# [7] "\nSpiegel goes HAM on social media\n"                     
# [8] "\nBanks won't take weed companies’ cash\n"                
# [9] "\nThe Koch bros just took a $650m stake in Time\n"        
#[10] "\n4 mins to smarter Monday smalltalk\n"     
#...
#...
           
webpage %>%
  html_nodes("[class='col-md-12 daily-wrap clearfix'] a") %>%
  html_attr('href') -> link

# [1] "https://thehustle.co/apple-christmas-present"                          
# [2] "https://thehustle.co/war-on-robots"                                    
# [3] "https://thehustle.co/big-data-trade-nafta-daily"                       
# [4] "https://thehustle.co/apple-podcast-market"                             
# [5] "https://thehustle.co/tencent-spotify-truce"                            
# [6] "https://thehustle.co/first-rule-of-farmers"                            
# [7] "https://thehustle.co/snap-anti-facebook"                               
# [8] "https://thehustle.co/weed-banking"                                     
# [9] "https://thehustle.co/pepshi-bros"                                      
#[10] "https://thehustle.co/rundown"    
#...
#...                              

Upvotes: 1

Related Questions