Reputation: 15
I have a table that looks like this. And I want to get the distinct count horizontally across the three columns ignoring nulls.
ID | Column1 | Column 2 | Column 3 |
---|---|---|---|
1 | A | B | C |
2 | A | A | B |
3 | A | A |
The desired output I'm looking for is:
ID | Column1 | Column 2 | Column 3 | unique_count |
---|---|---|---|---|
1 | A | B | C | 3 |
2 | A | A | B | 2 |
3 | A | A | 1 |
Upvotes: 0
Views: 1314
Reputation: 17915
case when C1 not in (C2, C3) then 1 else 0 end +
case when C2 not in (C3) then 1 else 0 end + 1
This will not work if you intend to count nulls. The pattern would extend to more columns by successively comparing each one to all columns to its right. The order doesn't strictly matter. There's just no point in repeating the same test over and over.
If the values were alphabetically ordered then you could test only adjacent pairs to look for differences. While that applies to your limited sample it would not be the most general case.
Using a column pivot with a distinct count aggregate is likely to be a lot less efficient, less portable, and a lot less adaptable to a broad range of queries.
Upvotes: 1
Reputation: 12234
One possible option would be
WITH sample AS (
SELECT 'A' Column1, 'B' Column2, 'C' Column3 UNION ALL
SELECT 'A', 'A', 'B' UNION ALL
SELECT 'A', 'A', NULL UNION ALL
SELECT '', 'A', NULL
)
SELECT Column1, Column2, Column3, COUNT(DISTINCT NULLIF(TRIM(c), '')) unique_count
FROM (SELECT *, ROW_NUMBER() OVER () rn FROM sample) t LATERAL VIEW EXPLODE(ARRAY(Column1, Column2, Column3)) tf AS c
GROUP BY Column1, Column2, Column3, rn;
+---------+---------+---------+--------------+
| column1 | column2 | column3 | unique_count |
+---------+---------+---------+--------------+
| | A | NULL | 1 |
| A | A | NULL | 1 |
| A | A | B | 2 |
| A | B | C | 3 |
+---------+---------+---------+--------------+
Upvotes: 1