Jake
Jake

Reputation: 4660

HQL: Randomly query 100 out of 10000 records

I have the following HQL query

   Query q = session.createQuery
("Select  j.salaryStartRange From Job j where region like (:region) and
 uniquejobid in (:ujIds) and j.salaryStartRange > 10000 
 and (:degreeName) in elements(j.areasOfStudy) order by rand()");
    q.setParameterList("ujIds", ujIds);
    q.setParameter("region", region);
    q.setParameter("degreeName", degreeName);

It seems that the query time is the same no matter whether I use:

List<Integer> result = q.setFirstResult(0).setMaxResults(100).list();

or

List<Integer> result = q.list();

IN other words the full query of possible results is conducted in both queries. THe maxResults limit seems to be set only once the query is complete. I only want to query for a random selection of 100 records (out of potentially thousands of records). How do I create such a query?

In sum: How do I randomly query 100 out of 10000 records as efficiently as possible?

Upvotes: 0

Views: 655

Answers (1)

Naros
Naros

Reputation: 21113

In order for PostgreSQL to process ORDER BY RAND(), the database itself must have to fetch all rows in order to randomly select the rows in question. On small sized tables, you probably would never notice any speed concerns, but on larger tables you will.

If I were to insert 100 million rows into a table and issue the following query:

SELECT id, data FROM MyTable ORDER BY RANDOM() LIMIT 10

This particular query takes a number of seconds to process, only to return 10 rows. If you were to examine the explain plan, you'd see the Sort cost on 100 million rows.

You may find a solution with this post.

Upvotes: 1

Related Questions