Hamid Shariati
Hamid Shariati

Reputation: 599

find 3rd max of salary in a large scale sql table

one of my job interview questions was this. I answered with order by and he ask me even in trillion rows? I accepted my mistake and say I guess there is an option for MAX() maybe!!

in these questions:

Using LIMIT within GROUP BY to get N results per group?

SQL select nth member of group

the trillion rows not considered and performance is low.

but after many search I found that even answered questions are not right for large scale rows. anybody know what is the correct query?

Upvotes: 0

Views: 106

Answers (4)

Hamid Shariati
Hamid Shariati

Reputation: 599

according to the mysql documentation: https://dev.mysql.com/doc/refman/8.0/en/partitioning-info.html

I think partitioning is the best way for huge data

Upvotes: 0

GMB
GMB

Reputation: 222432

This looks like a typical use case for window functions (if supported by the concerned RDBMS), so something like:

SELECT *
FROM (
    SELECT t.*, ROW_NUMBER() OVER(ORDER BY Salary DESC) rn
    FROM mytable t
) x 
WHERE rn = 3

This query would take advantage of an index on column Salary.

Bottom line : there is no way that you can get that result without scanning the whole table and ordering (or aggregating) the results. Indexing is the key to optimize that.

Upvotes: 0

THE LIFE-TIME LEARNER
THE LIFE-TIME LEARNER

Reputation: 1522

In Mysql:-

  SELECT *
       FROM tblSalary t1
       WHERE ( n) = 
       ( 
       SELECT COUNT( t2.columnSalary )
       FROM tblSalary t2
       WHERE t2.columnSalary >= t1.columnSalary
       )

where (n) you can put any number to return highest salary..

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269623

ORDER BY is fine. You just want an index on the column.

If I had a table with a trillion rows and one of the columns was called salary, I would question the data model or implementation. In any case, it would be in a highly parallel database. Such databases tend to have their quirks, but there are usually methods to extract the value.

Of course, what the interviewer was looking for is more like:

select max(t.col)
from t
where t.column < (select max(t2.col)
                  from t t2
                  where t2.col < (select max(t3.col) from t)
                 );

This would scan the table three times and has no intermediate steps for aggregation or sorting. But in practice, there are likely to be other solutions depending on the database.

Upvotes: 1

Related Questions