schneebii
schneebii

Reputation: 1

Scraping Google News with Rvest for Keywords

I want to compare News Article from different countries for the usage of a specific keyword.

My idea is to scrape Google News using RCrawler:

RCrawler(website = “https://news.google.com/topics/CAAqIggKIhxDQkFTRHdvSkwyMHZNREZqY0hsNUVnSmtaU2dBUAE?hl=de&gl=DE&ceid=DE%3Ade”, MaxDepth = 5, Keywordfilter = c(“Keyword”), KeywordAccuracy = 99)

And then just counting the results that I’m getting back. Im not sure if this is the best method or if its even correct but I’m new to R and its the best method i can currently think of.

Upvotes: 0

Views: 1109

Answers (1)

Aman
Aman

Reputation: 417

Since you're using Google News, instead of scraping this way, an easier method would be to access the RSS feed for that particular keyword and pull that into a dataframe. Luckily, there is the {tidyRSS} package that you can use to do just this.

An example of what a feed looks like is with this URL:

https://news.google.com/rss/search?q=apple&hl=en-IN&gl=IN&ceid=IN:en

Learn how to customize this URL here. You can search by geolocation if you wish.

After you install tidyRSS, you can implement it like so:

library(tidyRSS)

# I will search for the keyword Apple

keyword <- "https://news.google.com/rss/search?q=apple&hl=en-IN&gl=IN&ceid=IN:en"
# From the package vignette

google_news <- tidyfeed(
  keyword,
  clean_tags = TRUE,
  parse_dates = TRUE
)

This gives you a dataframe with many variables that describe each article. You can choose which ones to keep.

Upvotes: 2

Related Questions