Amanul Haque
Amanul Haque

Reputation: 11

How to get news based on publishing date using news-please python library

I am trying to curate News articles covering the same story from different media outlets for which I m using news-please python library. The following code gives me the news using the given URL, but I want to get multiple news articles (based on specific dates, or date range). Does anyone know how I can do it?

This is the code that works to get news using specific URLs:

from newsplease import NewsPlease

article = NewsPlease.from_url('https://www.nytimes.com/2017/02/23/us/politics/cpac-stephen-bannon-reince-priebus.html?hp')

print(article.title)

Upvotes: 1

Views: 2073

Answers (2)

pedjjj
pedjjj

Reputation: 1038

Shishdem's answer is excellent when you want to get many articles from the common crawl news archive (also called common crawl news crawl, or CCNC).

However, if it's just a few more articles that you're looking for, you can use NewsPlease.from_urls([url1, url2, ...], timeout=6) to crawl these articles (see https://github.com/fhamborg/news-please#use-within-your-own-code-as-a-library). Note, this command does not support filtering out of the box, but I'd still prefer it (and subsequent filtering of the articles, implemented yourself) over the common crawl news archive variant of news-please if you just want to have a few articles. One reason for this is that in order to get a filtered subset of articles from CCNC you'd theoretically need to process the complete CCNC since the articles within CCNC are not necessarily ordered by publishing date. For instance, an article from Jan. 1, 2018 by news outlet A could be crawled by CCNC just a day later, whereas it could happen that another article from Jan. 2, 2018 by publisher B is crawled a month or even year later.

Upvotes: 0

Shishdem
Shishdem

Reputation: 498

You can achieve this by either extracting the publishing date from the article object that is created, or you can use a WARC file.

More information is available right in the documentation: https://github.com/fhamborg/news-please#use-within-your-own-code-as-a-library

Upvotes: 1

Related Questions