Reputation: 1
I'm trying to scrape news articles from FoxNews using Rvest. However I can't find the right Node to get the header and URL for scraping. Could it be that FoxNews is blocking me from scraping their site?
html_fox <- read_html("https://www.foxnews.com/search-results/search?q=trump")
html_fox %>%
html_nodes(".article") %>%
html_text()
If I enter this the return is {xml_nodeset (0)}
Can anybody help? I've been trying to figure this out for days now and I can't find an answer.
Thanks!
Upvotes: 0
Views: 544
Reputation: 1336
One possible solution could be RSelenium
library
Below simple example
library(RSelenium)
#Start a selenium server and browser
driver <- rsDriver(browser=c("firefox"), port = 4567L)
#Defines the client part.
remote_driver <- driver[["client"]]
#Sent the web site address to the firefox
remote_driver$navigate("https://www.foxnews.com/search-results/search?q=trump")
#To take the first article, you could do this:
all_articles<-remote_driver$findElement(using = 'xpath', value = '//*[@id="wrapper"]/div[2]/div[2]/div')$getElementText()
print(all_articles)
Upvotes: 1