Reputation: 1

Average using SQL Group by needs to omit duplicates and group by more than one column

I'm using SQL Server 2016 and I'm having an issue grouping by more than one col and finding an average while omitting duplicate rows. I have a transaction table defined as:

CREATE TABLE [dbo].[CUST_TRANSACTION](  
     [EXTRACT_DATE] [date] NULL,
     [CUSTOMER_ID] [bigint] NULL,   
     [TRANS_NUMBER] [bigint] NULL,
     [CATEGORY] [smallint] NULL,    
     [RANKING] [smallint] NULL )

Here is some data:

EXTRACT_DATE    CUSTOMER_ID TRANS_NUMBER    CATEGORY    RANKING
10/31/2017          10001   1000101              4  100
10/31/2017          10001   1000102              4  100
10/31/2017          10002   1000201              4  200
10/31/2017          10001   1000103              5  100
10/31/2017          10003   1000301              5  300
10/31/2017          10003   1000302              5  300
10/31/2017          10004   1000401              7  500
10/31/2017          10001   1000104              8  100

The Customer_Id AND TRANS_NUMBER combo needs to be unique, but a customer_id can have 1 to Many Trans_Numbers and a Customer_Id can exist in 1 to many Categories. From the data I reviewed, the Ranking for a Customer_ID seems to be the same for a given EXTRACT_DATE. I found no NULLS in the Ranking, but I did find zeroes, so I need to exclude any zeroes from the Average.

The request is to generate a report broken down by each Category ( 1 - 15) and find the Average Ranking within that Category, but to only count a customer_id once and also find the Max Ranking with that Category. This is for a given EXTRACT_Date.

So I ran the following:

Select CATEGORY, MAX(RANKING) "Max Ranking", AVG(RANKING) "Average Ranking"
from CUST_TRANSACTION 
where  EXTRACT_DATE = Convert(datetime, '2017-10-31' )
     and RANKING > 1
group by CATEGORY
order by CATEGORY

Generated the following output:

CATEGORY    Max Ranking Average Ranking 
4                 200   133 
5                 300   233 
7                 500   500 
8                 100   100

But Category 4 should have an Average of 150 since customer_Id = 10001 has two entries and Category 5 should be = 200 since Customer_id 10003 has two entries.

When I tried to Group by both Category, Customer_Id, the output includes each combination of Category and Customer_Id, which is what Group by does. So I'm not sure if I need a sub-select or any other ideas?

Thanks

Upvotes: 0

Answers (3)

Richard

Reputation: 27508

Due to different requirements of overall average and maximum you can't use a single column to get both. A sub-select will deliver one column for averaging and another for maximum'ing.

DECLARE @QUERY_DATE DATE = '2017-10-31';

Select 
    CATEGORY
    , MAX(RANKING_detail_max) "Max Ranking"
    , AVG(RANKING_detail_sum) "Average Ranking"
from (
  select CATEGORY
       , CUSTOMER_ID
       , SUM(RANKING) RANKING_detail_sum
       , MAX(RANKING) RANKING_detail_max
  from CUST_TRANSACTION 
  where EXTRACT_DATE = @QUERY_DATE
    and RANKING > 0
  group by CATEGORY, CUSTOMER_ID
) rollup
group by CATEGORY
order by CATEGORY

Upvotes: 0

Alex Kudryashev

Reputation: 9460

You can use Common Table Expression (CTE) to filter out duplicate customerID in a category. Something like this.

;with cte as (
select CATEGORY, RANKING, EXTRACT_DATE
ROW_NUMBER() over(partition by category, customer_id order by customer_id) rn
from CUST_TRANSACTION
)
Select CATEGORY, MAX(RANKING) "Max Ranking", AVG(RANKING) "Average Ranking"
from cte --CUST_TRANSACTION 
where  EXTRACT_DATE = Convert(datetime, '2017-10-31' )
     and RANKING > 1
and rn = 1
group by CATEGORY
order by CATEGORY

Upvotes: 0

ben

Reputation: 329

it looks like you don't care about the trans_number mappings, so you could remove it and choose distinct remaining values in a derived table:

Select CATEGORY, MAX(RANKING) "Max Ranking", AVG(RANKING) "Average Ranking"
from ( select distinct [EXTRACT_DATE] ,
                [CUSTOMER_ID] ,
                [CATEGORY] ,
                [RANKING] from CUST_TRANSACTION )CUST_TRANSACTION
where  EXTRACT_DATE = Convert(datetime, '2017-10-31' )
     and RANKING > 1
group by CATEGORY
order by CATEGORY

Upvotes: 1

Average using SQL Group by needs to omit duplicates and group by more than one column

Answers (3)

Related Questions