stats_noob
stats_noob

Reputation: 5925

Include Multiple Search Terms in an HTML Request

I found this post over here that shows how to search for news articles on Google using R:Scraping Google News with Rvest for Keywords

This post shows how to search for a single term, for example: keyword <- "https://news.google.com/rss/search?q=apple&hl=en-IN&gl=IN&ceid=IN:en"

Could I write the query like this?

library(tidyRSS)

#I have feeling that "IN" stands for "India" - if I want to change this to "Canada", I think I need to replace "IN" with "CAN"?

keyword <- "https://news.google.com/rss/search?q=apple&q=covid&hl=en-IN&gl=IN&ceid=IN:en"

# From the package vignette

google_news <- tidyfeed(
    keyword,
    clean_tags = TRUE,
    parse_dates = TRUE
)

Is this correct?

Thank you!

PS: I wonder if there is a way to restrict the dates between which the search will be performed?

Upvotes: 1

Views: 75

Answers (1)

akrun
akrun

Reputation: 887501

For multiple items, if we want either of them use OR or if both needs to be present use AND. Similarly, the hl stands for language, and gl for country. In addition, for date ranges, use keyword before/after

library(tidyRSS)
keyword <- "https://news.google.com/rss/search?q=apple%20AND%20covid+after:2022-07-01+before:2022-08-02&hl=en-US&gl=US&ceid=US:en"
google_news <- tidyfeed(
    keyword,
    clean_tags = TRUE,
    parse_dates = TRUE
)

-checking for the date ranges

library(dplyr)
> all(between(as.Date(google_news$feed_pub_date), 
   as.Date("2022-07-01"), as.Date("2022-08-02")))
[1] TRUE

Upvotes: 1

Related Questions