jay
jay

Reputation: 1463

Using Count Distinct with Pivot in Snowflake

I am trying to do a pivoting on column Join_mon and get aggregate count for each ID , as shown in following query;

select *
from CTE3
pivot(COUNT(DISTINCT platform_payer_name) for Join_Mon in (
 '2021-03-01',
 '2021-02-01',
  '2021-01-01',
 '2020-12-01'

        ))
  as p
order by ID
)

As you can see I am trying to get distinct count her for the column platform_payer_name. But it is giving the following error;

SQL compilation error: syntax error line 48 at position 16 unexpected 'DISTINCT'

I am quite positive DISTINCT works with COUNT in snowflake. Can I get some help why it is failing here. Help is appreciated.

Upvotes: 0

Views: 1763

Answers (3)

Sheldon
Sheldon

Reputation: 135

Sorry for the lazy answer:

I recently had the exact same issue where I needed a count( distinct value ) as part of the pivot command. Snowflake specifically addresses this in their documentation on the section on distinct counts. You have to aggregate the values as an array as detailed in this section. Then you can use array_size in your select to get what you want.

The relevant part of my query looked like this:

from events pivot (array_unique_agg(version) for node_id in (any))

Upvotes: 0

Lukasz Szozda
Lukasz Szozda

Reputation: 175586

Snowflake supports COUNT_IF:

SELECT id,
       COUNT_IF(join_mon='2021-03-01') AS "2021-03-01",
       COUNT_IF(join_mon='2021-02-01') AS "2021-02-01",
       COUNT_IF(join_mon='2021-01-01') AS "2021-01-01"
FROM (SELECT DISTINCT id, platform_name, join_mon FROM cte) s
GROUP BY id
ORDER BY id;

Upvotes: 0

Simeon Pilgrim
Simeon Pilgrim

Reputation: 25903

so making some fake data that maps to your pivot, albeit I dropped the excessive paren

with cte3(id, platform_payer_name, Join_Mon) as (
    select * from values
        (1,'aa', '2021-03-01'),
        (1,'aa', '2021-03-01'),
        (1,'aa', '2021-03-01'),
        (1,'aa', '2021-02-01'),
        (2,'bb', '2012-03-01'),
        (2,'cc', '2020-12-01')
)
select *
from CTE3 AS c
pivot(COUNT(c.platform_payer_name) for c.Join_Mon in (
         '2021-03-01',
         '2021-02-01',
         '2021-01-01',
         '2020-12-01' )
) as p
order by id;

gives:

ID  '2021-03-01'    '2021-02-01'    '2021-01-01'    '2020-12-01'
1   3               1               0               0
2   0               0               0               1

so makes sense you want distinct in there

but it seems it's not supported..

so while it's somewhat cut'n'paste error prone, it does "work":

with cte3(id, platform_payer_name, Join_Mon) as (
    select * from values
        (1,'aa', '2021-03-01'),
        (1,'aa', '2021-03-01'),
        (1,'aa', '2021-03-01'),
        (1,'aa', '2021-02-01'),
        (2,'bb', '2012-03-01'),
        (2,'cc', '2020-12-01')
)
select id
    ,count(distinct(iff(Join_Mon='2021-03-01',platform_payer_name,null))) as "2021-03-01"
    ,count(distinct(iff(Join_Mon='2021-02-01',platform_payer_name,null))) as "2021-02-01"
    ,count(distinct(iff(Join_Mon='2021-01-01',platform_payer_name,null))) as "2021-01-01"
    ,count(distinct(iff(Join_Mon='2020-12-01',platform_payer_name,null))) as "2020-12-01"
from CTE3 AS c
group by 1 order by 1;

gives:

ID  2021-03-01  2021-02-01  2021-01-01  2020-12-01
1   1           1           0           0
2   0           0           0           1

which works because pivot is doing two task, the first is moving values into columns if matching the input, thus that is the same as:

with cte3(id, platform_payer_name, Join_Mon) as (
select * from values
    (1,'aa', '2021-03-01'),
    (1,'aa', '2021-03-01'),
    (1,'aa', '2021-03-01'),
    (1,'aa', '2021-02-01'),
    (2,'bb', '2012-03-01'),
    (2,'cc', '2020-12-01')
)
select id
    ,iff(Join_Mon='2021-03-01',platform_payer_name,null) as "2021-03-01"
    ,iff(Join_Mon='2021-02-01',platform_payer_name,null) as "2021-02-01"
    ,iff(Join_Mon='2021-01-01',platform_payer_name,null) as "2021-01-01"
    ,iff(Join_Mon='2020-12-01',platform_payer_name,null) as "2020-12-01"
from CTE3 AS c
order by 1;

which gives:

ID, 2021-03-01, 2021-02-01, 2021-01-01, 2020-12-01
1,  aa,         NULL,       NULL,       NULL
1,  aa,         NULL,       NULL,       NULL
1,  aa,         NULL,       NULL,       NULL
1,  NULL,       aa,         NULL,       NULL
2,  NULL,       NULL,       NULL,       NULL
2,  NULL,       NULL,       NULL,       cc

which can then have a count(distinct x) ran over each column.

select id
    ,count(distinct("2021-03-01")) as "2021-03-01"
    ,count(distinct("2021-02-01")) as "2021-02-01"
    ,count(distinct("2021-01-01")) as "2021-01-01"
    ,count(distinct("2020-12-01")) as "2020-12-01"
from (
    select id
        ,iff(Join_Mon='2021-03-01',platform_payer_name,null) as "2021-03-01"
        ,iff(Join_Mon='2021-02-01',platform_payer_name,null) as "2021-02-01"
        ,iff(Join_Mon='2021-01-01',platform_payer_name,null) as "2021-01-01"
        ,iff(Join_Mon='2020-12-01',platform_payer_name,null) as "2020-12-01"
    from CTE3 AS c
)
group by id
order by id;

or can be done inline as I showed in the first answer.

Upvotes: 1

Related Questions