Kagi Fret
Kagi Fret

Reputation: 1

Scraping News Articles using rvest

I'm trying to scrape news articles from FoxNews using Rvest. However I can't find the right Node to get the header and URL for scraping. Could it be that FoxNews is blocking me from scraping their site?

html_fox <- read_html("https://www.foxnews.com/search-results/search?q=trump") 

html_fox %>% 
  html_nodes(".article") %>% 
  html_text()

If I enter this the return is {xml_nodeset (0)}

Can anybody help? I've been trying to figure this out for days now and I can't find an answer.

Thanks!

Upvotes: 0

Views: 544

Answers (1)

Earl Mascetti
Earl Mascetti

Reputation: 1336

One possible solution could be RSelenium library

Below simple example

library(RSelenium) 

#Start a selenium server and browser
driver <- rsDriver(browser=c("firefox"), port = 4567L)

#Defines the client part.
remote_driver <- driver[["client"]]

#Sent the web site address to the firefox 
remote_driver$navigate("https://www.foxnews.com/search-results/search?q=trump")

#To take the first article, you could do this: 
all_articles<-remote_driver$findElement(using = 'xpath', value = '//*[@id="wrapper"]/div[2]/div[2]/div')$getElementText()
print(all_articles)

Upvotes: 1

Related Questions