Reputation: 85
I have some data with only 3 attributes
name illness yes/no
I'm interested in the frequency of how many times someone has been ill from a set group of viruses whether its once, twice, thrice.
So I end up with a table like this
Frequency - Count
F1 300
F2 121921
F3 121
F4 2
F5+ 2323
This is some of the code I tried to write. The bit I am stuck on is how do I tell SQL that when a person hasnt been exposed to the virus give them a score of 0, if its once then 1, twice then 2.
SELECT
CASE WHEN F0 = 0 THEN 'F0'
WHEN F1 = 1 THEN 'F1'
WHEN F2 = 1 THEN 'F2'
wHEN F3 = 3 THEN 'F3'
WHEN F4 = 4 THEN 'F4'
WHEN F5 >= 5 THEN 'F5
END AS Virus_Frequency,
COUNT(*) AS VIRUS_COUNT
FROM
(
SELECT
distinct name,
SUM(CASE WHEN illness in (43598435, 24343, 45413, 2455, 54574511, 2074123, 34351153) THEN 0 END) AS F0,
Upvotes: 0
Views: 118
Reputation: 1269953
I think you want a histogram of histograms query:
SELECT VIRUS_COUNT, COUNT(*) as person_count, MIN(user_id), MAX(user_id)
FROM (SELECT user_id, COUNT(*) as VIRUS_COUNT
FROM t
WHERE Virus_id in (43598435, 24343, 45413, 2455, 54574511, 2074123, 34351153)
GROUP BY user_id
) t
GROUP BY VIRUS_COUNT
ORDER BY VIRUS_COUNT;
I include sample user ids in such a query, just so there are references for further investigation.
Upvotes: 2