Reputation: 41

More Efficient SQL query

Okay I'm doing a data conversion. I've got 12 or so columns in my new database table where I need to find only the most recent transaction for each product in the old database. Twelve times in my query I need a single value from a massive table of transactions.

To get the latest transaction I'm using a nested Inner Join to find the 'max'. These nested querys are killing the efficiency and the subtables are huge. My query is taking about 20 seconds when pulling just a sample of 100 products. I've got ~10,000. Any suggestions to make my nested queries more efficient? Here is an example of one of these nested Inner Joins.

   LEFT JOIN coverage_schedule CS_PL
     ON R.risk_pk = CS_PL.risk_fk
        AND CS_PL.coverage_type = 'HOMCVGE'
   LEFT JOIN ( Coverage_tran CT_PL
               JOIN (SELECT DISTINCT MAX(pct.coverage_tran_pk) AS recent,
                                     pct.coverage_schedule_fk,
                                     pcs.coverage_type
                     FROM   Coverage_tran pct
                            LEFT JOIN TRANSACTION pt
                              ON pct.transaction_fk = pt.transaction_pk
                            LEFT JOIN coverage_schedule pcs
                              ON pct.coverage_schedule_fk =
                                 pcs.po_coverage_schedule_pk
                                 AND pcs.coverage_type = 'HOMCVGE'
                     WHERE  pt.transaction_cycle = 'issued'
                            AND pct.cvg_status = 'Active'
                     GROUP  BY pct.coverage_schedule_fk,
                               pcs.coverage_type,
                               pt.transaction_cycle) mCT2
                 ON CT_PL.coverage_schedule_fk = mCT2.coverage_schedule_fk
                    AND CT_PL.coverage_tran_pk = mCT2.recent )
     ON CS_PL.coverage_schedule_pk = CT_PL.coverage_schedule_fk
        AND CT_PL.cvg_status = 'Active'

I've tried restricting the nested table query to just transactions of the correct 'coverage type' in this sample the coverage is called 'HOMCVGE' I've also restricted to just products with 'Active' status. I also tried removing the 'coverage' column from the table as I don't need it as both a column and as a restriction/rule for the join. I assumed restricting sample size would make it more efficient but I see no noticeable effect. Do I just need to shrink the table size by adding more restrictions? Is it better to place the restrictions in the 'join' or in the 'where' clause?

The Execution Plan is showing the nested query above has a cost of 11.6911 / 8% It's about the most expensive thing in the whole query... but that just tells me its slowing it down and I already knew that.

Any hints, clues, or tools I can use? I admit I've taken no classes on SQL so I haven't been taught any best practices and don't know the Big O costs. Heck I just found the Execution Plan 10 minutes ago.

Update------------------- I was able to shave off several seconds by removing unnecessary columns and groupings. So thank you. I hope I can make it even more efficient. I also found that when possible, it is better to add a restriction to the query in the 'where' clause and not attached directly to the 'JOIN' clause.

X-Zero, the way I see it I have two options. Basically I am looking for the current features on a product. In the new database I'm importing the current features as individual columns. In the old database the features are stored in a giant transaction table that includes all the transactions on the product related to adding, removing, or just quoting a change to the features. Each feature on the product has its own transaction set in the table. If the product never had the feature then there will be no record at all. (Hence the use of left joins) I'm also not pulling every feature type in the table, just the 12. So I can either:

Do as you say and use CTE to make a simplified transaction table that contains only the most recent active transactions of only relevant feature sets and then pull from this table by product in my main query.
Do as I'm doing now where each feature has its own separate sub query/table of recent transactions within the main query. These tables ONLY contain one feature each. I tried making these sub queries identical but it was coming out more efficient to make them unique because of of the many feature types in the table that I'm not using at all.

I'm not sure which way is better. It sounds like the CTE might be the way to go (as long as I limit it to just the 12 features I want) but it's another thing to teach myself.

Upvotes: 4

Answers (2)

Stuart Ainsworth

Reputation: 12940

Get rid of the DISTINCT keyword in the subquery; MAX will only return one value per GROUPing columns, so the DISTINCT is redundant.

EDIT: Martin's comment below is correct, but it does raise the possibility of an error in the subquery. Do you need to group by all three columns or just the first two? If the former, then I retract my answer; if the latter, you need to clean up your sample :)

Upvotes: 5

Michael Goldshteyn

Reputation: 74470

Sometimes, moving sub-queries into local variables, making them independent queries ahead of queries that use them is a way to improve performance, by forcing those queries to execute only once. Of course this is only possible for certain fixed queries. There is also the issue of worrying about whether or not multiple queries execute as a single atomic unit.

Also, do this only when the optimizer doesn't do a good job.

Upvotes: 1

More Efficient SQL query

Answers (2)

Related Questions