Reputation: 1
I want to compare News Article from different countries for the usage of a specific keyword.
My idea is to scrape Google News using RCrawler:
RCrawler(website = “https://news.google.com/topics/CAAqIggKIhxDQkFTRHdvSkwyMHZNREZqY0hsNUVnSmtaU2dBUAE?hl=de&gl=DE&ceid=DE%3Ade”, MaxDepth = 5, Keywordfilter = c(“Keyword”), KeywordAccuracy = 99)
And then just counting the results that I’m getting back. Im not sure if this is the best method or if its even correct but I’m new to R and its the best method i can currently think of.
Upvotes: 0
Views: 1109
Reputation: 417
Since you're using Google News, instead of scraping this way, an easier method would be to access the RSS feed for that particular keyword and pull that into a dataframe. Luckily, there is the {tidyRSS}
package that you can use to do just this.
An example of what a feed looks like is with this URL:
https://news.google.com/rss/search?q=apple&hl=en-IN&gl=IN&ceid=IN:en
Learn how to customize this URL here. You can search by geolocation if you wish.
After you install tidyRSS
, you can implement it like so:
library(tidyRSS)
# I will search for the keyword Apple
keyword <- "https://news.google.com/rss/search?q=apple&hl=en-IN&gl=IN&ceid=IN:en"
# From the package vignette
google_news <- tidyfeed(
keyword,
clean_tags = TRUE,
parse_dates = TRUE
)
This gives you a dataframe with many variables that describe each article. You can choose which ones to keep.
Upvotes: 2