Reputation: 426
Using SQL Server I have two tables, below sample Table #T1 in DB has well over a million rows, Table #T2 has 100 rows. Both tables are in Column format and I need to Pivot to rows and join both.
Can I get it all in one query with Cross Apply and remove the cte?
This is my code, I have correct output but is this the most efficient way to do this considering number of rows?
with cte_sizes
as
(
select SizeRange,Size,ColumnPosition
from #T2
cross apply (
values(Sz1,1),(Sz2,2),(Sz3,3),(Sz4,4)
) X (Size,ColumnPosition)
)
select a.ProductID,a.SizeRange,c.Size,isnull(x.Qty,0) as Qty
from #T1 a
cross apply (
values(a.Sale1,1),(a.Sale2,2),(a.Sale3,3),(a.Sale4,4)
) X (Qty,ColumnPosition)
inner join cte_sizes c
on c.SizeRange = a.SizeRange
and c.ColumnPosition = x.ColumnPosition
I have also code and considered this but is this the CROSS APPLY a better method?
with cte_sizes
as
(
select 1 as SizePos
union all
select SizePos + 1 as SizePos
from cte_sizes
where SizePos < 4
)
select a.ProductID
,a.SizeRange
,(case when b.SizePos = 1 then c.Sz1
when b.SizePos = 2 then c.Sz2
when b.SizePos = 3 then c.Sz3
when b.SizePos = 4 then c.Sz4 end
) as Size
,isnull((case when b.SizePos = 1 then a.Sale1
when b.SizePos = 2 then a.Sale2
when b.SizePos = 3 then a.Sale3
when b.SizePos = 4 then a.Sale4 end
),0) as Qty
from #T1 a
inner join #T2 c on c.SizeRange = a.SizeRange
cross join cte_sizes b
Upvotes: 2
Views: 866
Reputation: 67311
This is wild guessing, but my magic crystall ball told me, that you might be looking for something like this:
For this we do not need your table #TS
at all.
WITH Unpivoted2 AS
(
SELECT t2.SizeRange,A.* FROM #t2 t2
CROSS APPLY(VALUES(1,t2.Sz1)
,(2,t2.Sz2)
,(3,t2.Sz3)
,(4,t2.Sz4)) A(SizePos,Size)
)
SELECT t1.ProductID
,Unpivoted2.SizeRange
,Unpivoted2.Size
,Unpivoted1.Qty
FROM #t1 t1
CROSS APPLY(VALUES(1,t1.Sale1)
,(2,t1.Sale2)
,(3,t1.Sale3)
,(4,t1.Sale4)) Unpivoted1(SizePos,Qty)
LEFT JOIN Unpivoted2 ON Unpivoted1.SizePos=Unpivoted2.SizePos AND t1.SizeRange=Unpivoted2.SizeRange
ORDER BY t1.ProductID,Unpivoted2.SizeRange;
The result:
+-----------+-----------+------+------+
| ProductID | SizeRange | Size | Qty |
+-----------+-----------+------+------+
| 123 | S-XL | S | 1 |
+-----------+-----------+------+------+
| 123 | S-XL | M | 12 |
+-----------+-----------+------+------+
| 123 | S-XL | L | 13 |
+-----------+-----------+------+------+
| 123 | S-XL | XL | 14 |
+-----------+-----------+------+------+
| 456 | 8-14 | 8 | 2 |
+-----------+-----------+------+------+
| 456 | 8-14 | 10 | 22 |
+-----------+-----------+------+------+
| 456 | 8-14 | 12 | NULL |
+-----------+-----------+------+------+
| 456 | 8-14 | 14 | 24 |
+-----------+-----------+------+------+
| 789 | S-L | S | 3 |
+-----------+-----------+------+------+
| 789 | S-L | M | NULL |
+-----------+-----------+------+------+
| 789 | S-L | L | 33 |
+-----------+-----------+------+------+
| 789 | S-L | XL | NULL |
+-----------+-----------+------+------+
The idea in short:
The cte will return your #T2
in an unpivoted structure. Each name-numbered column (something you should avoid) is return as a single row with an index indicating the position.
The SELECT will do the same with #T1
and join the cte against this set.
If I get this (and the changes to the initial question) correctly, the approach above works perfectly well, but you want to know, what was best in performance.
The first answer to "What is the fastest approach?" is Race your horses by Eric Lippert.
Good to know 1: A CTE is nothing more then syntactic sugar. It will allow to type a sub-query once and use it like a table, but it has no effect to the way how the engine will work this down.
Good to know 2: It is a huge difference whether you use APPLY
or JOIN
. The first will call the sub-source once per row, using the current row's values. The second will have to create two sets first and will then join them by some condition. There is no general "what is better"...
For your issue: As there is one very big set and one very small set, all depends on when you reduce the big set usig any kind of filter. The earlier the better.
And most important: It is - in any case - a sign of bad structures - when you find name numbering (something like phone1
, phone2
, phoneX
). The most expensive work will be to transform your 4 name-numbered columns to some dedicated rows. This should be stored in normalized format...
If you still need help, I'd ask you to start a new question.
Upvotes: 1