Reputation: 1
I'm using SQL Server 2016 and I'm having an issue grouping by more than one col and finding an average while omitting duplicate rows. I have a transaction table defined as:
CREATE TABLE [dbo].[CUST_TRANSACTION](
[EXTRACT_DATE] [date] NULL,
[CUSTOMER_ID] [bigint] NULL,
[TRANS_NUMBER] [bigint] NULL,
[CATEGORY] [smallint] NULL,
[RANKING] [smallint] NULL )
Here is some data:
EXTRACT_DATE CUSTOMER_ID TRANS_NUMBER CATEGORY RANKING
10/31/2017 10001 1000101 4 100
10/31/2017 10001 1000102 4 100
10/31/2017 10002 1000201 4 200
10/31/2017 10001 1000103 5 100
10/31/2017 10003 1000301 5 300
10/31/2017 10003 1000302 5 300
10/31/2017 10004 1000401 7 500
10/31/2017 10001 1000104 8 100
The Customer_Id AND TRANS_NUMBER combo needs to be unique, but a customer_id can have 1 to Many Trans_Numbers and a Customer_Id can exist in 1 to many Categories. From the data I reviewed, the Ranking for a Customer_ID seems to be the same for a given EXTRACT_DATE. I found no NULLS in the Ranking, but I did find zeroes, so I need to exclude any zeroes from the Average.
The request is to generate a report broken down by each Category ( 1 - 15) and find the Average Ranking within that Category, but to only count a customer_id once and also find the Max Ranking with that Category. This is for a given EXTRACT_Date.
So I ran the following:
Select CATEGORY, MAX(RANKING) "Max Ranking", AVG(RANKING) "Average Ranking"
from CUST_TRANSACTION
where EXTRACT_DATE = Convert(datetime, '2017-10-31' )
and RANKING > 1
group by CATEGORY
order by CATEGORY
Generated the following output:
CATEGORY Max Ranking Average Ranking
4 200 133
5 300 233
7 500 500
8 100 100
But Category 4 should have an Average of 150 since customer_Id = 10001 has two entries and Category 5 should be = 200 since Customer_id 10003 has two entries.
When I tried to Group by both Category, Customer_Id, the output includes each combination of Category and Customer_Id, which is what Group by does. So I'm not sure if I need a sub-select or any other ideas?
Thanks
Upvotes: 0
Views: 1096
Reputation: 27508
Due to different requirements of overall average and maximum you can't use a single column to get both. A sub-select will deliver one column for averaging and another for maximum'ing.
DECLARE @QUERY_DATE DATE = '2017-10-31';
Select
CATEGORY
, MAX(RANKING_detail_max) "Max Ranking"
, AVG(RANKING_detail_sum) "Average Ranking"
from (
select CATEGORY
, CUSTOMER_ID
, SUM(RANKING) RANKING_detail_sum
, MAX(RANKING) RANKING_detail_max
from CUST_TRANSACTION
where EXTRACT_DATE = @QUERY_DATE
and RANKING > 0
group by CATEGORY, CUSTOMER_ID
) rollup
group by CATEGORY
order by CATEGORY
Upvotes: 0
Reputation: 9460
You can use Common Table Expression (CTE) to filter out duplicate customerID in a category. Something like this.
;with cte as (
select CATEGORY, RANKING, EXTRACT_DATE
ROW_NUMBER() over(partition by category, customer_id order by customer_id) rn
from CUST_TRANSACTION
)
Select CATEGORY, MAX(RANKING) "Max Ranking", AVG(RANKING) "Average Ranking"
from cte --CUST_TRANSACTION
where EXTRACT_DATE = Convert(datetime, '2017-10-31' )
and RANKING > 1
and rn = 1
group by CATEGORY
order by CATEGORY
Upvotes: 0
Reputation: 329
it looks like you don't care about the trans_number mappings, so you could remove it and choose distinct remaining values in a derived table:
Select CATEGORY, MAX(RANKING) "Max Ranking", AVG(RANKING) "Average Ranking"
from ( select distinct [EXTRACT_DATE] ,
[CUSTOMER_ID] ,
[CATEGORY] ,
[RANKING] from CUST_TRANSACTION )CUST_TRANSACTION
where EXTRACT_DATE = Convert(datetime, '2017-10-31' )
and RANKING > 1
group by CATEGORY
order by CATEGORY
Upvotes: 1