mjdxb
mjdxb

Reputation: 85

SQL Distinct with multiple Criteria

I want to produce a table output as per the SampleData Table & Output. My SQL Query is as follows but it is not giving me the right result for last three columns

SELECT
  UserLogin,
  COUNT(DISTINCT consumerID) AS TotalUniqueconsumerIDs,  
  COUNT(DISTINCT (CASE
        WHEN ActivitiesCompleted > 0 AND QuizCompleted <= 0 THEN consumerID END)) AS YesActivityNoQuiz,
  COUNT(DISTINCT (CASE
        WHEN ActivitiesCompleted <= 0 AND QuizCompleted > 0 THEN consumerID END)) AS NoActivityYesQuiz,  
COUNT(DISTINCT (CASE
        WHEN ActivitiesCompleted >= 1 AND QuizCompleted >= 1 THEN consumerID END)) AS YesActivityYesQuiz
FROM
  TableName
GROUP BY
  UserLogin

SampleData Table & Output

Any help is highly appreciated.


More info:

Perhaps I wasn't clear above so providing more info here:

I created another sample table with just data of 1 UserLogin which looks like this and if I pivot this sample data set externally in Excel I see this

As you can see that this pivot screenshot both of the consumerID in this table has more than 1 ActivitiesCompleted as well as 1 QuizCompleted.

The SQL Output based on my query above looks like this

However, based on Pivot screenshot, I am expecting 1 in YesActivityYesQuiz column and zero in YesActivityNoQuiz & NoActivityYesQuiz


Sample Data:

Date,UserLogin,consumerID,ActivitiesCompleted,QuizCompleted
6/8/2018,aasufhar,consumerA,0,1
6/8/2018,aasufhar,consumerA,0,0
6/8/2018,aasufhar,consumerA,1,0
6/8/2018,aasufhar,consumerA,1,0
6/8/2018,aasufhar,consumerA,1,0
6/8/2018,aasufhar,consumerA,1,0
6/8/2018,aasufhar,consumerA,1,0
6/8/2018,aasufhar,consumerA,1,0
6/8/2018,aasufhar,consumerA,1,0
6/8/2018,aasufhar,consumerA,2,0
6/8/2018,aasufhar,consumerA,1,0
6/8/2018,aasufhar,consumerA,1,0
6/8/2018,aasufhar,consumerA,1,0
6/8/2018,aasufhar,consumerA,0,0
6/8/2018,aasufhar,consumerA,0,0
6/8/2018,aasufhar,consumerA,0,0
6/8/2018,aasufhar,consumerA,0,0
6/8/2018,aasufhar,consumerA,0,0
6/8/2018,aasufhar,consumerA,0,0
6/8/2018,aasufhar,consumerA,0,0
6/8/2018,aasufhar,consumerA,0,0
6/8/2018,aasufhar,consumerA,0,0
6/8/2018,aasufhar,consumerA,0,0
6/8/2018,aasufhar,consumerA,0,0
6/8/2018,aasufhar,consumerB,0,1
6/8/2018,aasufhar,consumerB,0,0
6/8/2018,aasufhar,consumerB,2,0
6/8/2018,aasufhar,consumerB,1,0
6/8/2018,aasufhar,consumerB,1,0
6/8/2018,aasufhar,consumerB,1,0
6/8/2018,aasufhar,consumerB,0,0
6/8/2018,aasufhar,consumerB,0,0
6/8/2018,aasufhar,consumerB,0,0
6/8/2018,aasufhar,consumerB,0,0
6/8/2018,aasufhar,consumerB,0,0

Upvotes: 0

Views: 1667

Answers (3)

maddy
maddy

Reputation: 50

You have to include ActivitiesCompleted in your group by clause because you are using that in your case statement. Its not throwing you error?

SELECT
  UserLogin,
  COUNT(DISTINCT consumerID) AS TotalUniqueconsumerIDs,  
  COUNT(DISTINCT (CASE
        WHEN ActivitiesCompleted > 0 AND QuizCompleted <= 0 THEN consumerID END)) AS YesActivityNoQuiz,
  COUNT(DISTINCT (CASE
        WHEN ActivitiesCompleted <= 0 AND QuizCompleted > 0 THEN consumerID END)) AS NoActivityYesQuiz,  
COUNT(DISTINCT (CASE
        WHEN ActivitiesCompleted >= 1 AND QuizCompleted >= 1 THEN consumerID END)) AS YesActivityYesQuiz
FROM
  TableName
GROUP BY
  UserLogin,ActivitiesCompleted 

Upvotes: 0

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173190

The query in question seems to be correct to me (at least it returns what is shown in expected result)

Still, I can propose less verbose version of it (BigQuery Standard SQL)

#standardSQL
SELECT UserLogin, 
  COUNT(DISTINCT consumerID) AS TotalUniqueconsumerIDs,  
  COUNT(DISTINCT IF(ActivitiesCompleted > 0 AND QuizCompleted <= 0, consumerID, NULL)) AS YesActivityNoQuiz,
  COUNT(DISTINCT IF(ActivitiesCompleted <= 0 AND QuizCompleted > 0, consumerID, NULL)) AS NoActivityYesQuiz,  
  COUNT(DISTINCT IF(ActivitiesCompleted >= 1 AND QuizCompleted >= 1, consumerID, NULL)) AS YesActivityYesQuiz
FROM TableName
GROUP BY UserLogin

Updated query based on recent example/explanation from OP

#standardSQL
SELECT UserLogin, 
  COUNT(DISTINCT consumerID) AS TotalUniqueconsumerIDs,  
  COUNT(DISTINCT IF(ActivitiesCompleted > 0 AND QuizCompleted <= 0, consumerID, NULL)) AS YesActivityNoQuiz,
  COUNT(DISTINCT IF(ActivitiesCompleted <= 0 AND QuizCompleted > 0, consumerID, NULL)) AS NoActivityYesQuiz,  
  COUNT(DISTINCT IF(ActivitiesCompleted >= 1 AND QuizCompleted >= 1, consumerID, NULL)) AS YesActivityYesQuiz,
  COUNT(DISTINCT IF(ActivitiesCompleted <= 0 AND QuizCompleted <= 0, consumerID, NULL)) AS NoActivityNoQuiz
FROM (
  SELECT UserLogin,consumerID,
    SUM(ActivitiesCompleted) ActivitiesCompleted,
    SUM(QuizCompleted) QuizCompleted
  FROM TableName
  GROUP BY UserLogin,consumerID
)
GROUP BY UserLogin

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1271013

Perhaps you just want max():

SELECT UserLogin, COUNT(DISTINCT consumerID) AS TotalUniqueconsumerIDs,  
       MAX(CASE WHEN ActivitiesCompleted > 0 AND QuizCompleted <= 0 THEN 1 ELSE 0 END) AS YesActivityNoQuiz,
       MAX(CASE WHEN ActivitiesCompleted <= 0 AND QuizCompleted > 0 THEN 1 ELSE 0 END) AS NoActivityYesQuiz,  
       MAX(CASE WHEN ActivitiesCompleted >= 1 AND QuizCompleted >= 1 THEN 1 ELSE 0 END) AS YesActivityYesQuiz
FROM TableName
GROUP BY UserLogin;

Upvotes: 1

Related Questions