user3590149
user3590149

Reputation: 1605

SQL- How to grab a large list of data instead of iterating each query

I gather a list of items for each item I check the database with SQL query with the following code:

SELECT * 
FROM task_activity as ja 
join task as j on ja.task_id = j.id 
WHERE j.name = '%s' 
  AND ja.avg_runtime <> 0 
  AND ja.avg_runtime is not NULL 
  AND ja.id = (SELECT MAX(id) FROM task_activity 
               WHERE task_id = ja.task_id 
                 and avg_runtime <> 0 
                 AND ja.avg_runtime is not NULL) 
  % str(task.get('name'))).fetchall()

But do I need to iterate through the list and make a query for everyone. This list is quite large at times. Can I just make one query and get back a list data set? In this particular query I'm only looking for the column avg_runtime with the task id and the maximum id will be the last calculated runtime.

I don't have access to the database other then to make queries. Using Microsoft SQL Server 2012 (SP1) - 11.0.3349.0 (X64)

Upvotes: 0

Views: 181

Answers (1)

Laurence
Laurence

Reputation: 10976

You might be able to speed this up using row_number(). Note, I think there's a bug in your original query. Should ja.avg_runtime in the subquery just be avg_runtime?

sql = """with x as (
    select
        task_id,
        avg_runtime,
        id,
        row_number() over (partition by ja.task_id order by ja.id desc) rn
    from
        task_activity as ja 
            join 
        task as j 
            on ja.task_id = j.id 
    where
        j.name in ({0}) and
        ja.avg_runtime <> 0 and
        ja.avg_runtime is not null        
) select
    task_id,
    avg_runtime,
    id
from
    x
where
    rn = 1;"""

# build up ?,?,? for parameter substitution
# assume tasknames is the list containing the task names.
params = ",".join(tasknames.map(lambda x: "?"))
# connection is your db connection
cursor = connection.cursor()
# interpolate the ?,?,? and bind parameters
cursor.execute(sql.format(params), tasknames)
cursor.fetchall()

the following index should make this query pretty fast (although it depends how many rows are being excluded by the filters on ja.avg_runtime):

create index ix_task_id_id on task_activity (task_id, id desc);

Upvotes: 1

Related Questions