Reputation: 2650
I am trying to create a query in MS SQL Server 2012 which gives me a count
, average
and some sum
values of distinct records in a database table. I'll try to explain my situation and my wishes as best as I can. If there remains something unclear or if some extra information is needed, please let me know.
Having the following table TEMP
with 10 records:
TABLE
╔════════╦═════════════╦════════╦═══════════╗
║ Number ║ DateOfBirth ║ Gender ║ Activity ║
╠════════╬═════════════╬════════╬═══════════╣
║ 191806 ║ 1940-08-31 ║ F ║ AMADMIN ║
║ 196484 ║ 1940-09-23 ║ F ║ AMHOST ║
║ 199480 ║ 1949-10-16 ║ F ║ AMTRAINER ║
║ 201089 ║ 1947-04-08 ║ M ║ AMTRAINER ║
║ 204528 ║ 1950-05-02 ║ F ║ AMHOST ║
║ 226356 ║ 1966-04-12 ║ M ║ AMADMIN ║
║ 226356 ║ 1966-04-12 ║ M ║ AMHOST ║
║ 377599 ║ 1985-05-15 ║ F ║ AMADMIN ║
║ 377599 ║ 1985-05-15 ║ F ║ AMHOST ║
║ 395809 ║ 1980-03-03 ║ F ║ AMADMIN ║
╚════════╩═════════════╩════════╩═══════════╝
Now, consider running the following query:
SQL
SELECT COUNT([Number]) AS Number, ROUND(AVG(CAST(DATEDIFF(DAY, [DateOfBirth], GETDATE()) / 365.2425 AS FLOAT)), 1) AS AverageAge,
SUM(CASE WHEN [Gender] = 'M' THEN 1 ELSE 0 END) AS Male,
SUM(CASE WHEN [Gender] = 'F' THEN 1 ELSE 0 END) AS Female
FROM [TEMP]
WHERE [Activity] IN ('AMHOST', 'AMADMIN', 'AMTRAINER')
This query will give me the following result:
RESULT
╔════════╦════════════╦══════╦════════╗
║ Number ║ AverageAge ║ Male ║ Female ║
╠════════╬════════════╬══════╬════════╣
║ 10 ║ 57,3 ║ 3 ║ 7 ║
╚════════╩════════════╩══════╩════════╝
So far so good! But now for the tricky part. What I really want is this result for all distinct
records in the table. That means calculating the average age and male/female counts for all persons minus the two "double" persons (having Number
226356
and 377599
). So I need a query which produces the following result:
WANTED RESULT
╔════════╦════════════╦══════╦════════╗
║ Number ║ AverageAge ║ Male ║ Female ║
╠════════╬════════════╬══════╬════════╣
║ 8 ║ 56,9 ║ 2 ║ 6 ║
╚════════╩════════════╩══════╩════════╝
I know how to get the distinct
records for one piece of the query like so:
SQL
SELECT COUNT(DISTINCT([Number])) AS Number, ROUND(AVG(CAST(DATEDIFF(DAY, [DateOfBirth], GETDATE()) / 365.2425 AS FLOAT)), 1) AS AverageAge,
SUM(CASE WHEN [Gender] = 'M' THEN 1 ELSE 0 END) AS Male,
SUM(CASE WHEN [Gender] = 'F' THEN 1 ELSE 0 END) AS Female
FROM [TEMP]
WHERE [Activity] IN ('AMHOST', 'AMADMIN', 'AMTRAINER')
But this produces:
RESULT
╔════════╦════════════╦══════╦════════╗
║ Number ║ AverageAge ║ Male ║ Female ║
╠════════╬════════════╬══════╬════════╣
║ 8 ║ 57,3 ║ 3 ║ 7 ║
╚════════╩════════════╩══════╩════════╝
Now the Number
count is good, but the AverageAge
, Male
and Female
values are not right.
My question is, how can I adjust my query in such way that I retrieve the values as shown in the WANTED RESULT set, if such a query is even possible to begin with?
Upvotes: 2
Views: 4858
Reputation: 7847
This query works. Did a subselect to get the base set.
SELECT COUNT([Number]) AS Number, ROUND(AVG(CAST(DATEDIFF(DAY, [DateOfBirth], GETDATE()) / 365.2425 AS FLOAT)), 1) AS AverageAge,
SUM(CASE WHEN [Gender] = 'M' THEN 1 ELSE 0 END) AS Male,
SUM(CASE WHEN [Gender] = 'F' THEN 1 ELSE 0 END) AS Female
FROM
(SELECT DISTINCT Number, DateOfBirth, Gender
FROM temp where [Activity] IN ('AMHOST', 'AMADMIN', 'AMTRAINER')) a
Upvotes: 1
Reputation: 795
Your query did not solve the problem because you only told sql to use the distinct data points for one of the columns, the number. When sql moves out of the parentheses and on to the calculations for the next columns, it is no longer using the distinct command.
In order to solve your problem, I would recommend the use of a subquery. There are other ways to do this, but I believe a subquery is your best bet because you can first filter down the data and then do the mathematical operations based on just the unique data points. Not all the columns in your data points are duplicates in the rows with the duplicated numbers. However, this is only in the activity column (which we can disregard since it is not necessary in the calculations). I am going to assume that the gender and the date of birth will always be the same. Now, your query will look like:
SELECT COUNT(DISTINCT(t.Number)) AS Number, ROUND(AVG(CAST(DATEDIFF(DAY, t.DateOfBirth, GETDATE()) / 365.2425 AS FLOAT)), 1) AS AverageAge,
SUM(CASE WHEN t.Gender = 'M' THEN 1 ELSE 0 END) AS Male,
SUM(CASE WHEN t.Gender = 'F' THEN 1 ELSE 0 END) AS Female
From
( Select t.number, t.DateOfBirth, t.Gender
From temp t
Where activity in ('AMHOST', 'AMADMIN', 'AMTRAINER')
Group by t.number, t.DateOfBirth, t.Gender) t
Upvotes: 1
Reputation: 69809
Since activity does not appear in any of the aggregate functions you can simply discount this from the results, and use a subquery to get distinct records before your aggregation, then also applying COUNT(DISTINCT CASE..
to your male/female counts:
SELECT COUNT(DISTINCT [Number]) AS Number,
ROUND(AVG(CAST(DATEDIFF(DAY, [DateOfBirth], GETDATE()) / 365.2425 AS FLOAT)), 1) AS AverageAge,
COUNT(DISTINCT CASE WHEN [Gender] = 'M' THEN [Number] END) AS Male,
COUNT(DISTINCT CASE WHEN [Gender] = 'F' THEN [Number] END) AS Female
FROM ( SELECT DISTINCT Number, DateOfBirth, Gender
FROM [sw_test].[dbo].[TEMP]
WHERE [Activity] IN ('AMHOST', 'AMADMIN', 'AMTRAINER')
) AS t;
Upvotes: 1