Reputation: 2877
I have data as follows:
Person | Group
-----------------
Andy | 1
Andy | 2
Doug | 2
Jack | 1
Carl | 2
Joe | 1
Joe | 2
I need an output from this data as follows:
num_persons | num_persons_in_both_groups | overlap
--------------------------------------------------------
5 | 2 | 40%
I can get the num_persons
but am confounded by num_persons_in_both_groups
. How can I get this using SQL? Assuming it's MySQL. overlap
is straightforward.
Thanks!
Upvotes: 1
Views: 49
Reputation: 4335
Great question! Try the below:
SELECT SUM(GroupCount) AS num_persons
, SUM(CASE WHEN PersonsInGroups > 1 THEN 1 ELSE 0 END) AS num_persons_in_both_groups
, (SUM(CASE WHEN PersonsInGroups > 1 THEN 1 ELSE 0 END) / SUM(GroupCount))*100 AS overlap
FROM
(
SELECT COUNT(1) AS PersonsInGroups
, 1 AS GroupCount
FROM t
GROUP BY Person
) x
Per the comment, you could actually just write it like:
SELECT COUNT(1) AS num_persons
, SUM(CASE WHEN PersonsInGroups > 1 THEN 1 ELSE 0 END) AS num_persons_in_both_groups
, (SUM(CASE WHEN PersonsInGroups > 1 THEN 1 ELSE 0 END) / COUNT(1))*100 AS overlap
FROM
(
SELECT COUNT(1) AS PersonsInGroups
FROM t
GROUP BY Person
) x
Upvotes: 3
Reputation: 1269503
I would recommend two levels of aggregation:
select count(*) as num_persons,
sum(group_1 > 0 and group_2 > 0) as in_both,
avg(group_1 > 0 and group_2 > 0) as ratio
from (select person,
max( group = 1 ) as group_1,
max( group = 2 ) as group_2
from t
group by person
) p
I like this because it is flexible.
However, you can also do this without a subquery, assuming exactly two groups:
select count(distinct person) as num_persons,
count(*) - count(distinct person) as num_in_both,
(count(*) - count(distinct person)) / count(distinct person) as ratio
from t;
Upvotes: 2