Reputation: 838
What I need to do is get a Cost breakout for each grouping, aggregated by day. Also, only taking the top N per the whole date range. I'm probably not explaining this well so let me give examples. Say my table schema and data looks like this:
SoldDate Product State Cost
----------------------- --------------------- --------- ------
2017-07-11 01:00:00.000 Apple NY 6
2017-07-11 07:00:00.000 Banana NY 1
2017-07-11 07:00:00.000 Banana NY 1
2017-07-12 01:00:00.000 Pear NY 2
2017-07-12 03:00:00.000 Olive TX 1
2017-07-12 16:00:00.000 Banana NY 1
2017-07-13 22:00:00.000 Apple NY 6
2017-07-13 22:00:00.000 Apple NY 6
2017-07-13 23:00:00.000 Banana NY 1
Call this table SoldProduce. Now what I'm looking for is to group by Day, Product and State but for each day, only take the top two of the group NOT the top of that particular day. Anything else gets lumped under 'other'. So in this case, our top two groups with the greatest Cost are Apple-NY and Banana-NY. So those are the two that should show up in the output only. Anything else is under 'Other'
So in the end this is the desired output:
SoldDay Product State Total Cost
----------------------- --------------------- --------- ------
2017-07-11 00:00:00.000 Apple NY 6
2017-07-11 00:00:00.000 Banana NY 2
2017-07-11 00:00:00.000 OTHER OTHER 0
2017-07-12 00:00:00.000 OTHER OTHER 3
2017-07-12 00:00:00.000 Banana NY 1
2017-07-13 00:00:00.000 Apple NY 12
2017-07-13 00:00:00.000 Banana NY 1
2017-07-13 00:00:00.000 OTHER OTHER 0
Note how on the 12th Pear and Olive were lumped under other. Even though it outsold Banana on that day. This is because I want the Top N selling groups for the whole range, not just on a day by day basis.
I did a lot of googleing a way to make a query to get this data but I'm not sure if it's the best way:
WITH TopX AS
(
SELECT
b.Product,
b.State,
b.SoldDate,
b.Cost,
DENSE_RANK() OVER (ORDER BY GroupedCost DESC) as [Rank]
FROM
(
SELECT
b.Product,
b.State,
b.SoldDate,
b.Cost,
SUM(b.Cost) OVER (PARTITION BY b.Product, b.State) as GroupedCost
FROM
SoldProduce b WITH (NOLOCK)
) as b
)
SELECT
DATEADD(d,DATEDIFF(d,0,SoldDate),0),
b.Product,
b.State,
SUM(b.Cost)
FROM
TopX b
WHERE
[Rank] <= 2
GROUP BY
DATEADD(d,DATEDIFF(d,0,SoldDate),0),
b.Product,
b.State
UNION ALL
SELECT
DATEADD(d,DATEDIFF(d,0,SoldDate),0),
null,
null,
SUM(b.Cost)
from
TopX b
WHERE
[Rank] > 2
GROUP BY
DATEADD(d,DATEDIFF(d,0,SoldDate),0)
Step 1) Create a common query that first projects the cost that the row would be has we just grouped by Product and State. Then it does a second projection to rank that cost 1-N where 1 has the greatest grouped cost.
Step 2) Call upon the common query, grouping by day and restricting to rows <= 2. This is the Top elements. Then union the other category to this, or anything ranked > 2.
What do you guys think? Is this an efficient solution? Could I do this better?
Edit:
FuzzyTrees suggestion benchmarks better than mine.
Final query used:
WITH TopX AS
(
SELECT
TOP(2)
b.Product,
b.State
FROM
SoldProduce b
GROUP BY
b.Product,
b.State
ORDER BY
SUM(b.Cost)
)
SELECT
DATEADD(d,DATEDIFF(d,0,SoldDate),0),
coalesce(b.Product, 'Other') Product,
coalesce(b.State, 'Other') State,
SUM(b.Cost)
FROM
SoldProduce a
LEFT JOIN TopX b ON
(a.Product = b.Product OR (a.Product IS NULL AND b.Product IS NULL)) AND
(a.State = b.State OR (a.State IS NULL AND b.State IS NULL))
GROUP BY
DATEADD(d,DATEDIFF(d,0,SoldDate),0),
coalesce(b.Product, 'Other') Product,
coalesce(b.State, 'Other') State,
ORDER BY DATEADD(d,DATEDIFF(d,0,SoldDate),0)
-- Order by optional. Just for display purposes.
--More effienct to order in code for the final product.
--Don't use I/O if you don't have to :)
Upvotes: 2
Views: 218
Reputation: 32392
I suggest using a plain group by
without window functions for your TopX
view:
With TopX AS
(
select top 2 Product, State
from SoldProduce
group by Product, State
order by sum(cost) desc
)
Then you can left join to your TopX
view and use coalesce
to determine which products fall into the Other
group
select
coalesce(TopX.Product, 'Other') Product,
coalesce(TopX.State, 'Other') State,
sum(Cost),
sp.SoldDate
from SoldProduce sp
left join TopX on TopX.Product = sp.Product
and TopX.State = sp.State
group by
coalesce(TopX.Product, 'Other'),
coalesce(TopX.State, 'Other'),
SoldDate
order by SoldDate
Note: This query will not return 0 counts
Upvotes: 1