How to Scrape Google RssFeed Links?

Question

I'm trying to scrape links that are found in google RssFeeds for a given country.

The links are located in the xml format when you visit this url https://news.google.com/rss/search?q={"Example_Country"}

I am able to parse the links given, but when I use requests they return Javascript and not the actual links as when you click them in a browser.

What are these links google uses in the xml rss feed. And what is the result when using them in requests.get. Ultimately I want to know what's the best way to either get the actual link and scrape them?

So far I am able to Parse the xml file of https://news.google.com/rss/search?q={"Example_Country"}.

But when I try the following approach of:

`headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}

response = requests.get(url, headers=headers, allow_redirects=True)
response.encoding = 'utf-8'`

I have no idea what is returned. I was expected to be redirected to the actual url.

How to Scrape Google RssFeed Links?

Answers (1)

Related Questions