Reputation: 47
I wrote this very simple query and it has been running for over 16 hours now. I have tried all ways to optimize it but have no clue on how can it be improved further. Please help.
select
a.*, b.TM_IN_XIT
into
scratch.dbo.ab_peak2
from
scratch.dbo.ab_peak1 a
left join
Scratch.dbo.ab_tnt b on a.DC_ZIP between b.ORIG_ZIP_low
and b.ORIG_ZIP_high
and a.Destination between b.DEST_ZIP_low and b.DEST_ZIP_high
and a.Carr_Mode = b.Mode;
Table scratch.dbo.ab_peak1
looks as below and has ~700k records:
+-----------------------------------------------+
| ShipmentNumber DC_ZIP Destination Carr_Mode |
+-----------------------------------------------+
| 252838748 60622 10016 A |
| 252731857 60622 40517 A |
| 252685087 60622 91601 B |
| 252574905 60622 7017 B |
| 252877256 60622 97230 A |
| 254791362 20166 54971 B |
| 255866277 60622 19131 A |
| 255728088 60622 27713 B |
| 255614555 60622 10009 A |
| 255823071 60622 33556 B |
+-----------------------------------------------+
Table Scratch.dbo.ab_tnt
looks as below and has ~1.5M records:
+-----------------------------------------------------------------------------------+
| Mode ORIG_ZIP_low ORIG_ZIP_high DEST_ZIP_low DEST_ZIP_high TM_IN_XIT |
+-----------------------------------------------------------------------------------+
| A 41042 41042 62556 62556 2 |
| B 41042 41042 62556 62556 3 |
| A 41042 41042 62557 62557 1 |
| B 41042 41042 62557 62557 2 |
| A 41042 41042 62558 62563 2 |
| B 41042 41042 62558 62563 3 |
| A 41042 41042 62565 62567 1 |
| B 41042 41042 62565 62567 2 |
| A 41042 41042 62568 62570 2 |
| B 41042 41042 62568 62570 3 |
+-----------------------------------------------------------------------------------+
What I'm trying to achieve - "a" is a Shipments table and "b" is a table that has transit times for all origin-destination combos. The structure of "b" table is based on zip ranges as shown above. I'm trying to bring in transit time for each shipment of "a" table by looking up in "b" table.
I have tried following already:
Any other suggestions?
Upvotes: 1
Views: 74
Reputation: 11983
Just a thought, but have you tried enumerating all possible combinations in ab_tnt
, and using the result to join into ab_peak1
? Here is a sample using common table expressions, but a temp table might work better. Also, this answer assumes you have an Integers
table.
;with dest as (
select dest_zip = i.I, tnt.Mode, orig_zip_low, orig_zip_high, tm_in_xit
from Integers i
join ab_tnt tnt on i.I between tnt.dest_zip_low and tnt.dest_zip_high
)
, tnt as (
select dest.Mode, dest_zip, orig_zip = i.I, dest.tm_in_xit
from Integers i
join dest on i.I between dest.orig_zip_low and dest.orig_zip_high
)
select *
from ab_peak1
join tnt
on ab_peak1.DC_ZIP = tnt.orig_zip
and ab_peak1.Destination = tnt.dest_zip
and ab_peak1.Carr_Mode = tnt.Mode
Upvotes: 0
Reputation: 1269463
For this query:
select a.*, b.TM_IN_XIT
into scratch.dbo.ab_peak2
from scratch.dbo.ab_peak1 a left join
Scratch.dbo.ab_tnt b
on a.DC_ZIP between b.ORIG_ZIP_low and b.ORIG_ZIP_high and
a.Destination between b.DEST_ZIP_low and b.DEST_ZIP_high and
a.Carr_Mode = b.Mode;
You first want to try indexes. The obvious one to try is ab_tnt(mode, orig_zip_low, orig_zip_high, dest_zip_low, dest_zip_high, tm_in_xit)
.
I might also be inclined to write the query this way:
select a.*,
(case when a.DC_ZIP <= b.ORIG_ZIP_high and a.destination <= b.DESC_ZIP_high
then b.TM_IN_XIT
end) as TM_IN_XIT
into scratch.dbo.ab_peak2
from scratch.dbo.ab_peak1 a outer apply
(select top 1 b.*
from Scratch.dbo.ab_tnt b
where a.DC_ZIP >= b.ORIG_ZIP_low and
a.Destination >= b.DEST_ZIP_low and
a.Carr_Mode = b.Mode
order by b.ORIG_ZIP_low, b.DEST_ZIP_low
) b;
Now, this isn't exactly the same query. It returns the first potentially matching zip code. The idea is that the subquery might make decent use of an index on ab_tnt(mode, b.ORIG_ZIP_low, b.DEST_ZIP_low)
.
The case
statement then determines if there is really a match.
I have used this logic very successfully on one dimension (handling IP ranges, for instance). I haven't used it for two dimensions, but it is worth trying if your current query has been running for the better part of a day.
Note: You can test the performance by running on a subset of records by using (select top 1000 * from scratch.dbo.ab_peak1) a
instead of the table.
Upvotes: 1