Reputation: 8177
I translated the problem into an employee/salary problem for simplicity.
Having an employee record emp
such as:
| id | salary (in 1000s) |
Given a number 'num
', find salary 'sal
' where the number of employees receiving salary<=sal
are >=num
(similar to area under curve problems in statistics).
We are using Python and SQLite, but the problem is not specific to them:
I'm doing the following (naive starting solution):
num = some_num
sal = 1000 # starting miminmum value
count = 0
while count < num:
sql = 'select count(*) from (select 1 from emp where salary<=? limit ?)'
# using limit so that we don't keep counting more than num - might help (?)
(count,) = cursor.execute(sql, (sal, num)).next() # using apsw sqlite adapter
sal += 1000
print sal
How can we make this more efficient? (Algorithmically or using standard SQL or equivalent, but not using quirks of a given system)
Or else: can it be made more efficient by adding extra fields to the record that can be kept up-to-date on insert/update ops without too much overhead?
Upvotes: 1
Views: 107
Reputation: 31952
If you are using a prepared statement, I believe you can move the preparation step out of the loop to make it much faster.
sql = 'select count(*) from (select 1 from emp where salary<=? limit ?)'
# using limit so that we don't keep counting more than num - might help (?)
while count < num:
(count,) = cursor.execute(sql, (sal, num))
sal += 1000
If you further want to improve performance and your db size is reasonably small, you could load the entire data into an array and do your operation.
I think further optimization is possible if you sort the array by salary first. After that you can do things like binary Search to where the <
condition flips and the index of that point + 1 would be the count.
The solution is simpler than it looks. If the records are sorted by salary then the #num'th
record's salary would be the desired answer, so this becomes a problem of selecting the n'th row:
num = some_num
sql = 'select salary from emp order by salary limit 1 offset ?'
(sal,) = cursor.execute(sql, (num-1,)).next()
print sal
Upvotes: 1