Moshe Brodsky
Moshe Brodsky

Reputation: 315

Count Distinct Duplicates Within Groups in MySql

I have an SQL table called main_table:

id | product_id | purchase_id | purchaser_id
---+------------+-------------+-------------
 1 |          1 |           1 |            1
 2 |          1 |           2 |            1
 3 |          1 |           3 |            1
 4 |          1 |           4 |            2
 5 |          1 |           5 |            2
 6 |          1 |           6 |            3
 7 |          2 |           1 |            1
 8 |          2 |           4 |            2
 9 |          2 |           5 |            2
10 |          2 |           7 |            2
11 |          2 |           8 |            2
12 |          2 |           6 |            3
13 |          2 |           9 |            3
14 |          2 |          10 |            3
15 |          2 |          11 |            3
16 |          2 |          12 |            4
17 |          2 |          13 |            4

I need to group by product_id and find the four things:

So the first 3 are relatively simple...

SELECT FROM `main_table`
  product_id,
  COUNT(DISTINCT `purchase_id`) AS `purchases`,
  COUNT(DISTINCT `purchaser_id`) AS `purchasers`,
  (COUNT(DISTINCT `purchase_id`) - COUNT(DISTINCT `purchaser_id`)) AS `repeat_purchases`,
  (??????) AS `repeat_purchasers`
GROUP BY product_id
ORDER BY product_id ASC

What is the ?????? in order to get the following table:

product_id | purchases | purchasers | repeat_purchases | repeat_purchasers
-----------+-----------+------------+------------------+------------------
         1 |         6 |          3 |                3 |                2 
         2 |        11 |          4 |                7 |                3 

Upvotes: 0

Views: 1048

Answers (2)

Dwipam Katariya
Dwipam Katariya

Reputation: 144

I would do something like this:

select a.product_id, count(*) as purchases, count(distinct(a.purchaser_id)) as 
purchasers, count(*) - count(distinct(a.purchaser_id)) as repeat_purchases, 
b.repeat_purchasers from main_table a, 
(select x.product_id, count(*) as repeat_purchasers from 
   (select y.product_id, y.purchaser_id from main_table y 
    group by y.purchaser_id, y.product_id having y.count > 1) x 
 group by x.product_id) b group by
a.product_id,b.repeat_purchasers,b.product_id having 
a.product_id = b.product_id`

This is essentially same as John's, but without JOIN

Upvotes: 0

John Woo
John Woo

Reputation: 263803

SELECT  a.product_id,
        COUNT(DISTINCT a.purchase_id) AS purchases,
        COUNT(DISTINCT a.purchaser_id) AS purchasers,
        (COUNT(DISTINCT a.purchase_id) - COUNT(DISTINCT a.purchaser_id)) AS repeat_purchases,
        COALESCE(c.totalCount,0) AS repeat_purchasers
FROM    main_table a
        LEFT JOIN
        (
            SELECT  product_id, COUNT(totalCOunt) totalCount
            FROM    
                    (
                        SELECT  product_id, purchaser_id, COUNT(*) totalCOunt
                        FROM    main_table
                        GROUP   BY product_id, purchaser_id
                        HAVING  COUNT(*) > 1
                    ) s
            GROUP   BY product_id
        ) c ON  a.product_id = c.product_id
GROUP   BY product_id

OUTPUT

╔════════════╦═══════════╦════════════╦══════════════════╦═══════════════════╗
║ PRODUCT_ID ║ PURCHASES ║ PURCHASERS ║ REPEAT_PURCHASES ║ REPEAT_PURCHASERS ║
╠════════════╬═══════════╬════════════╬══════════════════╬═══════════════════╣
║          1 ║         6 ║          3 ║                3 ║                 2 ║
║          2 ║        11 ║          4 ║                7 ║                 3 ║
╚════════════╩═══════════╩════════════╩══════════════════╩═══════════════════╝

Upvotes: 2

Related Questions