Plasty Grove
Plasty Grove

Reputation: 2877

Count number of persons in both groups

I have data as follows:

Person  |  Group
-----------------
Andy    |  1
Andy    |  2
Doug    |  2
Jack    |  1
Carl    |  2
Joe     |  1
Joe     |  2

I need an output from this data as follows:

num_persons  |  num_persons_in_both_groups  |  overlap
--------------------------------------------------------
5            |  2                           |  40%

I can get the num_persons but am confounded by num_persons_in_both_groups. How can I get this using SQL? Assuming it's MySQL. overlap is straightforward.

Thanks!

Upvotes: 1

Views: 49

Answers (2)

kjmerf
kjmerf

Reputation: 4335

Great question! Try the below:

SELECT SUM(GroupCount) AS num_persons
    , SUM(CASE WHEN PersonsInGroups > 1 THEN 1 ELSE 0 END) AS num_persons_in_both_groups
    , (SUM(CASE WHEN PersonsInGroups > 1 THEN 1 ELSE 0 END) / SUM(GroupCount))*100 AS overlap
FROM
(
SELECT COUNT(1) AS PersonsInGroups
    , 1 AS GroupCount
FROM t
GROUP BY Person
) x

Per the comment, you could actually just write it like:

SELECT COUNT(1) AS num_persons
    , SUM(CASE WHEN PersonsInGroups > 1 THEN 1 ELSE 0 END) AS num_persons_in_both_groups
    , (SUM(CASE WHEN PersonsInGroups > 1 THEN 1 ELSE 0 END) / COUNT(1))*100 AS overlap
FROM
(
SELECT COUNT(1) AS PersonsInGroups
FROM t
GROUP BY Person
) x

Upvotes: 3

Gordon Linoff
Gordon Linoff

Reputation: 1269503

I would recommend two levels of aggregation:

select count(*) as num_persons,
       sum(group_1 > 0 and group_2 > 0) as in_both,
       avg(group_1 > 0 and group_2 > 0) as ratio
from (select person,
             max( group = 1 ) as group_1,
             max( group = 2 ) as group_2
      from t
      group by person
     ) p

I like this because it is flexible.

However, you can also do this without a subquery, assuming exactly two groups:

select count(distinct person) as num_persons,
       count(*) - count(distinct person) as num_in_both,
       (count(*) - count(distinct person)) / count(distinct person) as ratio
from t;

Upvotes: 2

Related Questions