Ankur Garg
Ankur Garg

Reputation: 2853

Improving performance for sql select query

So I am building a Notification System where there can be million subscribers for a particular topic (some string) . Say million Subscribers want to receive notification for topic "abc".

Now we are storing data for million subscribers inside a mysql database .

So for topic "abc" I want to retreive this list of million subscribers .

I am using Hibernate as my ORM here . So for selecting the list of subscribers, I am doing a select

select * from Subscription AS sub INNER JOIN Topic AS t ON sub.topicId = t.topicId 
INNER JOIN Subscriber AS sr ON sub.subscriberId= sr.subscriberId  
WHERE t.topic = 'abc' 

Considering that I have million of Subscribers in my database and this will be scanning over million of rows . I believe the query will take a long time to execute .

I am retreiving the list in my DAO(using Spring JPA) as

List<Subscription>subList = subRepository.findByTopicName(eventBean.getTopic())

I want my resultsets to get populated as fast as possible as the idea is to send notification in real time .

Considering the above premise , will calling database this way and considering million of rows are there will give desired performance ? My guess is no .

How can I optimize my query and how do I retreive it so that I can acheive real time performance ??

I know using indexes will increase performance but does how to retrieve all this data at once . Is it possible with Hibernate to store this much data in some cache ? Will using this caching be efficient ??

Also , will pagination help here ???

I am not looking for exact solutions here but just idea from people who have solved this kind of problem before .

Upvotes: 3

Views: 57

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269763

For this query:

select *
from Subscription AS sub INNER JOIN
     Topic AS t
     ON sub.topicId = t.topicId INNER JOIN
     Subscriber AS sr
     ON sub.subscriberId = sr.subscriberId  
WHERE t.topic = 'abc' ;

You want the following indexes: Topic(topic, topicId), Subscription(topicId, sub, subsriberId), and Subscriber(subsriberId).

The performance of the query is then going to be based on the volume of data being returned. Returning a million rows is a lot of rows, so that will be an important performance consideration.

Upvotes: 1

Related Questions