Reputation: 2853
So I am building a Notification System where there can be million subscribers for a particular topic (some string) . Say million Subscribers want to receive notification for topic "abc".
Now we are storing data for million subscribers inside a mysql database .
So for topic "abc" I want to retreive this list of million subscribers .
I am using Hibernate as my ORM here . So for selecting the list of subscribers, I am doing a select
select * from Subscription AS sub INNER JOIN Topic AS t ON sub.topicId = t.topicId
INNER JOIN Subscriber AS sr ON sub.subscriberId= sr.subscriberId
WHERE t.topic = 'abc'
Considering that I have million of Subscribers in my database and this will be scanning over million of rows . I believe the query will take a long time to execute .
I am retreiving the list in my DAO(using Spring JPA) as
List<Subscription>subList = subRepository.findByTopicName(eventBean.getTopic())
I want my resultsets to get populated as fast as possible as the idea is to send notification in real time .
Considering the above premise , will calling database this way and considering million of rows are there will give desired performance ? My guess is no .
How can I optimize my query and how do I retreive it so that I can acheive real time performance ??
I know using indexes will increase performance but does how to retrieve all this data at once . Is it possible with Hibernate to store this much data in some cache ? Will using this caching be efficient ??
Also , will pagination help here ???
I am not looking for exact solutions here but just idea from people who have solved this kind of problem before .
Upvotes: 3
Views: 57
Reputation: 1269763
For this query:
select *
from Subscription AS sub INNER JOIN
Topic AS t
ON sub.topicId = t.topicId INNER JOIN
Subscriber AS sr
ON sub.subscriberId = sr.subscriberId
WHERE t.topic = 'abc' ;
You want the following indexes: Topic(topic, topicId)
, Subscription(topicId, sub, subsriberId)
, and Subscriber(subsriberId)
.
The performance of the query is then going to be based on the volume of data being returned. Returning a million rows is a lot of rows, so that will be an important performance consideration.
Upvotes: 1