lucas winter
lucas winter

Reputation: 191

Count Distinct Window Function with Groupby

I have a table that contains a user's name, market, and purchase_id. I'm trying to use a window function in SnowSql without a subquery to count the distinct number of purchases a user bought as well as the total number of unique purchases for the market.

Initial Table

User Market Purchase_ID
John Smith NYC 1
John Smith NYC 2
Bob Miller NYC 2
Bob Miller NYC 4
Tim Wilson NYC 3

The desired result would look like this:

User Purchases Unique Market Purchases
John Smith 2 4
Bob Miller 2 4
Tim Wilson 1 4

The query I've been attempting without a subquery looks like the below but receives an error with the groupby.

SELECT 
  user,
  COUNT(DISTINCT purchase_id),
  COUNT(DISTINCT purchase_id) OVER (partition by market)
FROM table
GROUP BY 1

Appreciate any assistance with this. Thanks!

Upvotes: 2

Views: 1246

Answers (3)

Adrian White
Adrian White

Reputation: 1804

This might work , you could wrangle to get into the format you're after but it produces the answer without subquery.

Uses the awesome GROUPING SETS which allows multiple group-by clauses in a single statement - the exact error you were hitting :-).

Awesome question!

  SELECT 
      COUNT(DISTINCT PURCHASE_ID)  
    , USER_NAME
    , MARKET
 FROM 
    CTE
  GROUP BY 
    GROUPING SETS (USER_NAME, MARKET);

enter image description here

Copy|Paste|Run

WITH CTE AS (SELECT 'JOHN SMITH' USER_NAME, 'NYC' MARKET,   1 
PURCHASE_ID
UNION SELECT 'JOHN SMITH' USER_NAME,    'NYC' MARKET,   2 PURCHASE_ID
UNION SELECT 'BOB MILLER' USER_NAME,    'NYC' MARKET,   2 PURCHASE_ID
UNION SELECT 'BOB MILLER' USER_NAME,    'NYC' MARKET,   4 PURCHASE_ID
UNION SELECT 'TIM WILSON' USER_NAME,    'NYC' MARKET,   3 PURCHASE_ID) 

SELECT 
      COUNT(DISTINCT PURCHASE_ID)  
    , USER_NAME
    , MARKET
FROM 
    CTE
GROUP BY 
    GROUPING SETS (USER_NAME, MARKET);

Upvotes: 1

nbk
nbk

Reputation: 49375

DISTTNCT in a window function is not allowed, so you need to use the subquery

CREATE TABLE table1
    (`User` varchar(10), `Market` varchar(3), `Purchase_ID` int)
;
    
INSERT INTO table1
    (`User`, `Market`, `Purchase_ID`)
VALUES
    ('John Smith', 'NYC', 1),
    ('John Smith', 'NYC', 2),
    ('Bob Miller', 'NYC', 2),
    ('Bob Miller', 'NYC', 4),
    ('Tim Wilson', 'NYC', 3)
;
SELECT 
  user,
  COUNT(DISTINCT purchase_id)
  ,MAX((SELECT COUNT(DISTINCT purchase_id) FROM table1 WHERE `Market` = t1.`Market` )) bymarkte
FROM table1 t1
GROUP BY 1
user       | COUNT(DISTINCT purchase_id) | bymarkte
:--------- | --------------------------: | -------:
Bob Miller |                           2 |        4
John Smith |                           2 |        4
Tim Wilson |                           1 |        4
SELECT 
  user,
  COUNT(DISTINCT purchase_id)
  ,MAX(countr) bymarkte
FROM table1 t1
INNER JOIN (SELECT `Market`,COUNT(DISTINCT purchase_id) countr FROM table1 GROUP BY  `Market` ) ta ON t1.`Market` = ta.`Market`

GROUP BY 1
user       | COUNT(DISTINCT purchase_id) | bymarkte
:--------- | --------------------------: | -------:
Bob Miller |                           2 |        4
John Smith |                           2 |        4
Tim Wilson |                           1 |        4

db<>fiddle here

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269873

I don't think you can do this simply as an aggregation. But you can get the answer like this:

SELECT user,
       SUM( (seqnum = 1)::INT ) as purchases,
       SUM(SUM( (seqnum = 1)::INT )) OVER (PARTITION BY market) as market_purchases
FROM (SELECT t.*,
             ROW_NUMBER() OVER (PARTITION BY purchase_id ORDER BY purchase_id) as seqnum
      FROM table t
     ) t
GROUP BY 1

Upvotes: 1

Related Questions