ZiNNED
ZiNNED

Reputation: 2650

How can I get distinct average and sum values in one MS SQL query?

I am trying to create a query in MS SQL Server 2012 which gives me a count, average and some sum values of distinct records in a database table. I'll try to explain my situation and my wishes as best as I can. If there remains something unclear or if some extra information is needed, please let me know.

Having the following table TEMP with 10 records:

TABLE

╔════════╦═════════════╦════════╦═══════════╗
║ Number ║ DateOfBirth ║ Gender ║ Activity  ║
╠════════╬═════════════╬════════╬═══════════╣
║ 191806 ║ 1940-08-31  ║ F      ║ AMADMIN   ║
║ 196484 ║ 1940-09-23  ║ F      ║ AMHOST    ║
║ 199480 ║ 1949-10-16  ║ F      ║ AMTRAINER ║
║ 201089 ║ 1947-04-08  ║ M      ║ AMTRAINER ║
║ 204528 ║ 1950-05-02  ║ F      ║ AMHOST    ║
║ 226356 ║ 1966-04-12  ║ M      ║ AMADMIN   ║
║ 226356 ║ 1966-04-12  ║ M      ║ AMHOST    ║
║ 377599 ║ 1985-05-15  ║ F      ║ AMADMIN   ║
║ 377599 ║ 1985-05-15  ║ F      ║ AMHOST    ║
║ 395809 ║ 1980-03-03  ║ F      ║ AMADMIN   ║
╚════════╩═════════════╩════════╩═══════════╝

Now, consider running the following query:

SQL

  SELECT COUNT([Number]) AS Number, ROUND(AVG(CAST(DATEDIFF(DAY, [DateOfBirth], GETDATE()) / 365.2425 AS FLOAT)), 1) AS AverageAge,
    SUM(CASE WHEN [Gender] = 'M' THEN 1 ELSE 0 END) AS Male,
    SUM(CASE WHEN [Gender] = 'F' THEN 1 ELSE 0 END) AS Female
  FROM [TEMP]
  WHERE [Activity] IN ('AMHOST', 'AMADMIN', 'AMTRAINER')

This query will give me the following result:

RESULT

╔════════╦════════════╦══════╦════════╗
║ Number ║ AverageAge ║ Male ║ Female ║
╠════════╬════════════╬══════╬════════╣
║     10 ║ 57,3       ║    3 ║      7 ║
╚════════╩════════════╩══════╩════════╝

So far so good! But now for the tricky part. What I really want is this result for all distinct records in the table. That means calculating the average age and male/female counts for all persons minus the two "double" persons (having Number 226356 and 377599). So I need a query which produces the following result:

WANTED RESULT

╔════════╦════════════╦══════╦════════╗
║ Number ║ AverageAge ║ Male ║ Female ║
╠════════╬════════════╬══════╬════════╣
║      8 ║ 56,9       ║    2 ║      6 ║
╚════════╩════════════╩══════╩════════╝

I know how to get the distinct records for one piece of the query like so:

SQL

  SELECT COUNT(DISTINCT([Number])) AS Number, ROUND(AVG(CAST(DATEDIFF(DAY, [DateOfBirth], GETDATE()) / 365.2425 AS FLOAT)), 1) AS AverageAge,
    SUM(CASE WHEN [Gender] = 'M' THEN 1 ELSE 0 END) AS Male,
    SUM(CASE WHEN [Gender] = 'F' THEN 1 ELSE 0 END) AS Female
  FROM [TEMP]
  WHERE [Activity] IN ('AMHOST', 'AMADMIN', 'AMTRAINER')

But this produces:

RESULT

╔════════╦════════════╦══════╦════════╗
║ Number ║ AverageAge ║ Male ║ Female ║
╠════════╬════════════╬══════╬════════╣
║      8 ║ 57,3       ║    3 ║      7 ║
╚════════╩════════════╩══════╩════════╝

Now the Number count is good, but the AverageAge, Male and Female values are not right.

My question is, how can I adjust my query in such way that I retrieve the values as shown in the WANTED RESULT set, if such a query is even possible to begin with?

Upvotes: 2

Views: 4858

Answers (3)

SQLChao
SQLChao

Reputation: 7847

This query works. Did a subselect to get the base set.

SELECT COUNT([Number]) AS Number, ROUND(AVG(CAST(DATEDIFF(DAY, [DateOfBirth], GETDATE()) / 365.2425 AS FLOAT)), 1) AS AverageAge,
SUM(CASE WHEN [Gender] = 'M' THEN 1 ELSE 0 END) AS Male,
SUM(CASE WHEN [Gender] = 'F' THEN 1 ELSE 0 END) AS Female
FROM 
  (SELECT DISTINCT Number, DateOfBirth, Gender 
   FROM temp where [Activity] IN ('AMHOST', 'AMADMIN', 'AMTRAINER')) a

Upvotes: 1

Jenn
Jenn

Reputation: 795

Your query did not solve the problem because you only told sql to use the distinct data points for one of the columns, the number. When sql moves out of the parentheses and on to the calculations for the next columns, it is no longer using the distinct command.

In order to solve your problem, I would recommend the use of a subquery. There are other ways to do this, but I believe a subquery is your best bet because you can first filter down the data and then do the mathematical operations based on just the unique data points. Not all the columns in your data points are duplicates in the rows with the duplicated numbers. However, this is only in the activity column (which we can disregard since it is not necessary in the calculations). I am going to assume that the gender and the date of birth will always be the same. Now, your query will look like:

SELECT COUNT(DISTINCT(t.Number)) AS Number, ROUND(AVG(CAST(DATEDIFF(DAY, t.DateOfBirth, GETDATE()) / 365.2425 AS FLOAT)), 1) AS AverageAge,
    SUM(CASE WHEN t.Gender = 'M' THEN 1 ELSE 0 END) AS Male,
    SUM(CASE WHEN t.Gender = 'F' THEN 1 ELSE 0 END) AS Female
From
(   Select t.number, t.DateOfBirth, t.Gender
    From temp t
    Where activity in ('AMHOST', 'AMADMIN', 'AMTRAINER')
    Group by t.number, t.DateOfBirth, t.Gender) t

Upvotes: 1

GarethD
GarethD

Reputation: 69809

Since activity does not appear in any of the aggregate functions you can simply discount this from the results, and use a subquery to get distinct records before your aggregation, then also applying COUNT(DISTINCT CASE.. to your male/female counts:

SELECT  COUNT(DISTINCT [Number]) AS Number, 
        ROUND(AVG(CAST(DATEDIFF(DAY, [DateOfBirth], GETDATE()) / 365.2425 AS FLOAT)), 1) AS AverageAge,
        COUNT(DISTINCT CASE WHEN [Gender] = 'M' THEN [Number] END) AS Male,
        COUNT(DISTINCT CASE WHEN [Gender] = 'F' THEN [Number] END) AS Female
FROM    (   SELECT  DISTINCT Number, DateOfBirth, Gender
            FROM    [sw_test].[dbo].[TEMP]
            WHERE   [Activity] IN ('AMHOST', 'AMADMIN', 'AMTRAINER')
        ) AS t;

Example on SQL Fiddle

Upvotes: 1

Related Questions