Reputation: 21
Hi I am designing a system which takes in article links from an API, sorts the articles into categories, and then sends a list of recommended article links to users based on users' specified filtering parameters.
The initial approach I've planned out is to use SQL databases to store the sorted articles as well as user info. Then each day I will run a SQL query on the article database for each user to fetch relevant article links. One thing I need to figure out is handling duplicate articles/users, but even assuming that there are unique instances this approach seems pretty inefficient.
I was wondering if there is a better way to design the system for scale, i.e., if the system has to handle the scope of millions of articles and millions of users?
Would grouping users together based on similar article filtering parameters be helpful (so potentially less queries need to be run if two or more users have the same article database querying)? Or would this effort be too complicated and not worthwhile?
Upvotes: 0
Views: 207
Reputation: 41
The user specifies the filters themself and new articles matching the filters should be send out? Sound more like "alert me if new articles arrive"?
Spontaneously this ideas:
If amount of articles >> users then inverse the logic: on every new article check if some users filter match and append it to a alertchannel on the user. (For new article complexity is O(n) where n is user amount)
If filter evaluation can be normalized (and splitted in filterparts) easily then storing the filters seperat and reference from filters to the users using that filter. Then you only need to evaluate if new articles matches the filters. (For new article complexity is O(n) where n is filter amount)
General:
Other Ideas:
And in general grow complexity of your evaluation once needed (its ok to start simplier and with an algorithm that scales not perfectly if it works for your case)
Upvotes: 1