Crypticlight
Crypticlight

Reputation: 47

Need suggestions to further optimize this SQL Server query

I wrote this very simple query and it has been running for over 16 hours now. I have tried all ways to optimize it but have no clue on how can it be improved further. Please help.

select 
    a.*, b.TM_IN_XIT 
into
    scratch.dbo.ab_peak2 
from 
    scratch.dbo.ab_peak1 a 
left join 
    Scratch.dbo.ab_tnt b on a.DC_ZIP between b.ORIG_ZIP_low 
                         and b.ORIG_ZIP_high 
                         and a.Destination between b.DEST_ZIP_low and b.DEST_ZIP_high  
                         and a.Carr_Mode = b.Mode;

Table scratch.dbo.ab_peak1 looks as below and has ~700k records:

+-----------------------------------------------+
| ShipmentNumber  DC_ZIP  Destination Carr_Mode |
+-----------------------------------------------+
| 252838748       60622       10016      A      |
| 252731857       60622       40517      A      |
| 252685087       60622       91601      B      |
| 252574905       60622       7017       B      |
| 252877256       60622       97230      A      |
| 254791362       20166       54971      B      |
| 255866277       60622       19131      A      |
| 255728088       60622       27713      B      |
| 255614555       60622       10009      A      |
| 255823071       60622       33556      B      |
+-----------------------------------------------+

Table Scratch.dbo.ab_tnt looks as below and has ~1.5M records:

+-----------------------------------------------------------------------------------+
| Mode    ORIG_ZIP_low    ORIG_ZIP_high   DEST_ZIP_low    DEST_ZIP_high   TM_IN_XIT |
+-----------------------------------------------------------------------------------+
|   A        41042            41042          62556            62556          2      |
|   B        41042            41042          62556            62556          3      |
|   A        41042            41042          62557            62557          1      |
|   B        41042            41042          62557            62557          2      |
|   A        41042            41042          62558            62563          2      |
|   B        41042            41042          62558            62563          3      |
|   A        41042            41042          62565            62567          1      |
|   B        41042            41042          62565            62567          2      |
|   A        41042            41042          62568            62570          2      |
|   B        41042            41042          62568            62570          3      |
+-----------------------------------------------------------------------------------+

What I'm trying to achieve - "a" is a Shipments table and "b" is a table that has transit times for all origin-destination combos. The structure of "b" table is based on zip ranges as shown above. I'm trying to bring in transit time for each shipment of "a" table by looking up in "b" table.

I have tried following already:

  1. Retained zipcodes in TNT table >= min(zipcodes in PEAK1 table) and <= max(zipcodes in PEAK1 table)
  2. Created indices on all columns of TNT table.

Any other suggestions?

Upvotes: 1

Views: 74

Answers (2)

Malk
Malk

Reputation: 11983

Just a thought, but have you tried enumerating all possible combinations in ab_tnt, and using the result to join into ab_peak1? Here is a sample using common table expressions, but a temp table might work better. Also, this answer assumes you have an Integers table.

;with dest as (
    select dest_zip = i.I, tnt.Mode, orig_zip_low, orig_zip_high, tm_in_xit
    from Integers i
         join ab_tnt tnt on i.I between tnt.dest_zip_low and tnt.dest_zip_high
)
, tnt as (
    select dest.Mode, dest_zip, orig_zip = i.I, dest.tm_in_xit
    from Integers i
         join dest on i.I between dest.orig_zip_low and dest.orig_zip_high

)
select * 
from ab_peak1
    join tnt 
         on ab_peak1.DC_ZIP = tnt.orig_zip
        and ab_peak1.Destination = tnt.dest_zip
        and ab_peak1.Carr_Mode = tnt.Mode

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269463

For this query:

select a.*, b.TM_IN_XIT 
into scratch.dbo.ab_peak2 
from scratch.dbo.ab_peak1 a left join
     Scratch.dbo.ab_tnt b
     on a.DC_ZIP between b.ORIG_ZIP_low and b.ORIG_ZIP_high and
        a.Destination between b.DEST_ZIP_low and b.DEST_ZIP_high and
        a.Carr_Mode = b.Mode;

You first want to try indexes. The obvious one to try is ab_tnt(mode, orig_zip_low, orig_zip_high, dest_zip_low, dest_zip_high, tm_in_xit).

I might also be inclined to write the query this way:

select a.*,
       (case when a.DC_ZIP <= b.ORIG_ZIP_high and a.destination <= b.DESC_ZIP_high
             then b.TM_IN_XIT
        end) as TM_IN_XIT
into scratch.dbo.ab_peak2 
from scratch.dbo.ab_peak1 a outer apply
     (select top 1 b.*
      from Scratch.dbo.ab_tnt b
      where a.DC_ZIP >= b.ORIG_ZIP_low and
            a.Destination >= b.DEST_ZIP_low and
            a.Carr_Mode = b.Mode
      order by b.ORIG_ZIP_low, b.DEST_ZIP_low
     ) b;

Now, this isn't exactly the same query. It returns the first potentially matching zip code. The idea is that the subquery might make decent use of an index on ab_tnt(mode, b.ORIG_ZIP_low, b.DEST_ZIP_low).

The case statement then determines if there is really a match.

I have used this logic very successfully on one dimension (handling IP ranges, for instance). I haven't used it for two dimensions, but it is worth trying if your current query has been running for the better part of a day.

Note: You can test the performance by running on a subset of records by using (select top 1000 * from scratch.dbo.ab_peak1) a instead of the table.

Upvotes: 1

Related Questions