When storing scrapy results to database, how to avoid storing duplicates

Question

I am just starting with scrapy and trying to develop a project where I scrape 'news links' from websites. For example, there is a website iltalehti.fi and I would like to scrape their news, let's say in every 5 minutes. Since each crawl will return duplicates, how do I avoid those duplicates from being stored in my database? So the end result would be a database containing only different entries but not the same news link twice (or 200 times in scenario if I run the crawler in every 5mins).

Any help is more than welcome and please note I know very little from python!

When storing scrapy results to database, how to avoid storing duplicates

Answers (1)

Related Questions