b0d
b0d

Reputation: 21

Sqlite - SELECT DISTINCT using GROUP BY

I have a SQLite table in which I would like to filter customers which have bought 2 or more DISTINCT items. The table looks like this:


customer  | product | quantity | date of purchase
---------------------------------------------
John      | coffee  |    1     |    02.20.2019
Bob       | coffee  |    2     |    01.13.2018
Chris     | milk    |    1     |    02.03.2012
Alice     | bread   |    1     |    05.01.2019
John      | banana  |    2     |    08.04.2015
Mary      | orange  |    3     |    12.16.2018
Mary      | orange  |    2     |    04.21.2019
Chris     | pie     |    1     |    07.02.2004

I've managed to select customers by grouping them using two rows in the GROUP BY query (customer and product):

SELECT * FROM test_table GROUP BY customer, product

This (almost) gave me all the customers that I'd originally wanted, except it also included Mary, which is not the desired output (she didn't buy two different products). I've tried to use a COUNT DISTINCT function to only show customers which have bought two or more different products like so:

SELECT customer, product, COUNT(DISTINCT(product)) AS counter FROM test_table
GROUP BY customer, product HAVING counter>1

The query successfully returned Chris and John without Mary, however it only outputs them once, showing only one of the products that they've bought. Is there a way in which I could get Chris and John with all distinct products that they've bought? The expected output of the query:

John      | coffee  |    1     |    02.20.2019
John      | banana  |    2     |    08.04.2015
Chris     | milk    |    1     |    02.03.2012
Chris     | pie     |    1     |    07.02.2004

Upvotes: 2

Views: 1740

Answers (2)

forpas
forpas

Reputation: 164089

You can use EXISTS to get all the rows:

SELECT t.*
FROM test_table t
WHERE EXISTS (SELECT 1 FROM test_table WHERE customer = t.customer AND product <> t.product)

Result:

> customer | product | quantity | date_of_purchase
> :------- | :------ | -------: | :---------------
> John     | coffee  |        1 | 02.20.2019      
> Chris    | milk    |        1 | 02.03.2012      
> John     | banana  |        2 | 08.04.2015      
> Chris    | pie     |        1 | 07.02.2004 

Or if you want 1 row for each customer with the products and dates as comma separated lists you can use GROUP_CONCAT():

SELECT customer, 
       GROUP_CONCAT(product) products,
       GROUP_CONCAT(date_of_purchase) dates
FROM test_table
GROUP BY customer
HAVING COUNT(DISTINCT product) > 1 

Result:

> customer | products      | dates                
> :------- | :------------ | :--------------------
> Chris    | milk,pie      | 02.03.2012,07.02.2004
> John     | coffee,banana | 02.20.2019,08.04.2015

See the demo.

Upvotes: 3

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521239

Using aggregation:

SELECT *
FROM test_table
WHERE customer IN (SELECT customer
                   FROM test_table
                   GROUP BY customer
                   HAVING MIN(product) <> MAX(product));

The HAVING clause above asserts that the "minimum" and "maximum" products are not the same, which implies that there be at least two distinct products associated with each matching customer.

For a more sophisticated solution, we could use analytic functions here:

WITH cte AS (
    SELECT *, MIN(product) OVER (PARTITION BY customer) AS min_product,
              MAX(product) OVER (PARTITION BY customer) AS max_product
    FROM test_table
)

SELECT customer, product, quantity, date_of_purchase
FROM cte
WHERE min_product <> max_product;

Upvotes: 2

Related Questions