A Dolegowski
A Dolegowski

Reputation: 93

How to find and count unique values across columns?

I'm using Snowflake and I'm hoping someone can help me understand how to count the number of unique names across columns for each row and overlook null values. Here is a sample of the data. So you can see below I'm looking to count the number of distinct values across the variables Name 1, Name 2, name 3, Name 4.

ID | Type    | Name 1 | Name 2 | Name 3 | Name 4 | Expected result
1  | animal  | cat    | Dog    | null   | Dog    | 2   
2  | animal  | fish   | cat    | cat    | cat    | 2
3  | animal  | fish   | cat    | dog    | rat    | 4

Upvotes: 0

Views: 1210

Answers (2)

karunP
karunP

Reputation: 91

Semi-structured functions can help here

select *,
      array_size(array_distinct(array_construct_compact(column1, column2, column3, column4))) _count
from values
    ('cat','dog', null,'dog'),
    ('fish', 'cat', 'cat', 'cat'),
    ('fish', 'cat', 'dog', 'rat'),
    (null, null, null, null)

Result:

COLUMN1 |COLUMN2|COLUMN3|COLUMN4|_COUNT
cat     |dog    |null   |dog    |2
fish    |cat    |cat    |cat    |2
fish    |cat    |dog    |rat    |4
null    |null   |null   |null   |0

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522797

You could use a unpivot approach:

SELECT ID, Type, COUNT(DISTINCT Name) AS cnt
FROM
(
    SELECT ID, Type, Name1 AS Name FROM yourTable UNION ALL
    SELECT ID, Type, Name2         FROM yourTable UNION ALL
    SELECT ID, Type, Name3         FROM yourTable UNION ALL
    SELECT ID, Type, Name4         FROM yourTable
) t
GROUP BY ID, Type;

Demo

This approach works by unpivoting the name data to a format where one record has just one ID and one name. Then, we aggregate and take the distinct count. The COUNT() function works well here, because by default it ignores NULL values, which is the behavior you want.

Upvotes: 2

Related Questions