Reputation: 647
I have a table:
id age tag
1 26 green
1 26 blue
2 28 yellow
3 45 red
4 40 blue
4 40 green
5 50 red
I need to find both the average and standard deviation of the number of distinct tags per age group (I have coded this with CASE WHEN).
So to get for example:
average_age average_tags sd_tags age_group
26.7 1.5 0.1 25-34
40 2 0 35-44
45 1 0.01 44+
Is there a way to more efficiently calculate the average count of tags and also to avoid repeating the CASE code in the sub-query for efficiency / readability?
SELECT
AVG(age) as average_age,
AVG(countField) as average_spread,
STDDEV(countField) as sd_spread,
CASE
WHEN v.age BETWEEN 25 AND 34 THEN '25-34'
WHEN v.age BETWEEN 35 AND 44 THEN '35-44'
ELSE '44+'
END AS age_group,
FROM vid v
(
SELECT AVG(v.age),
COUNT(DISTINCT(p.tag)) AS countField,
CASE
WHEN v.age BETWEEN 25 AND 34 THEN '25-34'
WHEN v.age BETWEEN 35 AND 44 THEN '35-44'
ELSE '44+'
END AS age_group,
FROM vid v
GROUP BY age_group
) as q WHERE age IS NOT NULL
GROUP BY age_group ORDER BY age_group;
Upvotes: 1
Views: 81
Reputation: 6638
you can use this query
select AVG(T.Age) as average_age,AVG(countField) as average_spread,age_group
from (select v.Age,COUNT(DISTINCT(v.Tag)) as countField,CASE
WHEN v.age BETWEEN 25 AND 34 THEN '25-34'
WHEN v.age BETWEEN 35 AND 44 THEN '35-44'
ELSE '44+' END AS age_group
from VID v
group by v.Age) T
WHERE age IS NOT NULL
GROUP BY age_group ORDER BY age_group;
Upvotes: 1