aadu
aadu

Reputation: 3254

Counting distinct id's grouped by a different id producing incorrect count's

I have an SQLite database table that has a similar structure to the following...

name     MicrosoftId     EventId

red      1               10001
blue     1               10001
green    2               10001
blue     2               10001
grey     3               10001

red      4               10002
green    5               10002
blue     5               10002

etc...

...and I am using the following query to generate some data about the table:

SELECT EventId as 'event', COUNT(DISTINCT MicrosoftId) as 'size',
SUM(CASE WHEN name = 'red' THEN 1 ELSE 0 END) as 'red',
SUM(CASE WHEN name = 'blue' THEN 1 ELSE 0 END) as 'blue',
SUM(CASE WHEN name = 'green' THEN 1 ELSE 0 END) as 'green'
FROM TagsMSCV 
WHERE name IN ('red','blue','green')
GROUP BY EventId

The bit that is going wrong is the 'size' column I'm making. It's supposed to be how many unique MicrosoftId's are in an EventId. So for the above example, EventId 10001 should be a size of 3, and EventId 10002 should be a size of 2.

The SQL I have written works without error, but the size's from COUNT(DISTINCT MicrosoftId) are all wrong and I can't figure out why. What am I doing wrong?

Upvotes: 1

Views: 41

Answers (2)

Thorsten Kettner
Thorsten Kettner

Reputation: 95053

The problem is your WHERE clause. It removes all colors except 'red', 'blue', and 'green', so you don't count the other colors. Remove the WHERE clause and you are fine.

UPDATE: You only want to show events that have at least one entry for red, green or blue. So add a HAVING clause or put your query in a subquery in order to use WHERE. Here are some options:

HAVING with separate conditions:

SELECT 
  EventId as event,
  COUNT(DISTINCT MicrosoftId) as size,
  SUM(CASE WHEN name = 'red' THEN 1 ELSE 0 END) as red,
  SUM(CASE WHEN name = 'blue' THEN 1 ELSE 0 END) as blue,
  SUM(CASE WHEN name = 'green' THEN 1 ELSE 0 END) as green
FROM TagsMSCV 
GROUP BY EventId
HAVING SUM(CASE WHEN name = 'red' THEN 1 ELSE 0 END) > 0
    OR SUM(CASE WHEN name = 'blue' THEN 1 ELSE 0 END) > 0 
    OR SUM(CASE WHEN name = 'green' THEN 1 ELSE 0 END) > 0 
ORDER BY event;

HAVING with one compound condition:

SELECT 
  EventId as event,
  COUNT(DISTINCT MicrosoftId) as size,
  SUM(CASE WHEN name = 'red' THEN 1 ELSE 0 END) as red,
  SUM(CASE WHEN name = 'blue' THEN 1 ELSE 0 END) as blue,
  SUM(CASE WHEN name = 'green' THEN 1 ELSE 0 END) as green
FROM TagsMSCV 
GROUP BY EventId
HAVING SUM(CASE WHEN name IN ('red', 'blue', 'green') THEN 1 ELSE 0 END) > 0
ORDER BY event;

WHERE on separate conditions:

SELECT *
FROM
(
  SELECT 
    EventId as event,
    COUNT(DISTINCT MicrosoftId) as size,
    SUM(CASE WHEN name = 'red' THEN 1 ELSE 0 END) as red,
    SUM(CASE WHEN name = 'blue' THEN 1 ELSE 0 END) as blue,
    SUM(CASE WHEN name = 'green' THEN 1 ELSE 0 END) as green
  FROM TagsMSCV 
  GROUP BY EventId
) q
WHERE red > 0 OR blue > 0 OR green > 0
ORDER BY event;

WHERE on a compound condition:

SELECT *
FROM
(
  SELECT 
    EventId as event,
    COUNT(DISTINCT MicrosoftId) as size,
    SUM(CASE WHEN name = 'red' THEN 1 ELSE 0 END) as red,
    SUM(CASE WHEN name = 'blue' THEN 1 ELSE 0 END) as blue,
    SUM(CASE WHEN name = 'green' THEN 1 ELSE 0 END) as green
  FROM TagsMSCV 
  GROUP BY EventId
) q
WHERE red + blue + green > 0
ORDER BY event;

Upvotes: 2

Yogesh Sharma
Yogesh Sharma

Reputation: 50173

The problem is with WHERE clause, your WHERE clause filter out the name hence the count() will count only filtered names which are 'red','blue','green'.

So, just remove the WHERE clause :

SELECT EventId as 'event', COUNT(DISTINCT MicrosoftId) as 'size',
       SUM(CASE WHEN name = 'red' THEN 1 ELSE 0 END) as 'red',
       SUM(CASE WHEN name = 'blue' THEN 1 ELSE 0 END) as 'blue',
       SUM(CASE WHEN name = 'green' THEN 1 ELSE 0 END) as 'green'
FROM TagsMSCV 
GROUP BY EventId;

Upvotes: 2

Related Questions