akshay gvs
akshay gvs

Reputation: 15

How to get distinct count over multiple columns in Hive SQL?

I have a table that looks like this. And I want to get the distinct count horizontally across the three columns ignoring nulls.

ID Column1 Column 2 Column 3
1 A B C
2 A A B
3 A A

The desired output I'm looking for is:

ID Column1 Column 2 Column 3 unique_count
1 A B C 3
2 A A B 2
3 A A 1

Upvotes: 0

Views: 1314

Answers (2)

shawnt00
shawnt00

Reputation: 17915

case when C1 not in (C2, C3) then 1 else 0 end +
case when C2 not in (C3)     then 1 else 0 end + 1

This will not work if you intend to count nulls. The pattern would extend to more columns by successively comparing each one to all columns to its right. The order doesn't strictly matter. There's just no point in repeating the same test over and over.

If the values were alphabetically ordered then you could test only adjacent pairs to look for differences. While that applies to your limited sample it would not be the most general case.

Using a column pivot with a distinct count aggregate is likely to be a lot less efficient, less portable, and a lot less adaptable to a broad range of queries.

Upvotes: 1

Jaytiger
Jaytiger

Reputation: 12234

One possible option would be

WITH sample AS (
  SELECT 'A' Column1, 'B' Column2, 'C' Column3 UNION ALL
  SELECT 'A', 'A', 'B' UNION ALL
  SELECT 'A', 'A', NULL UNION ALL
  SELECT '', 'A', NULL
)
SELECT Column1, Column2, Column3, COUNT(DISTINCT NULLIF(TRIM(c), '')) unique_count
  FROM (SELECT *, ROW_NUMBER() OVER () rn FROM sample) t LATERAL VIEW EXPLODE(ARRAY(Column1, Column2, Column3)) tf AS c
 GROUP BY Column1, Column2, Column3, rn;
output
+---------+---------+---------+--------------+
| column1 | column2 | column3 | unique_count |
+---------+---------+---------+--------------+
|         | A       | NULL    |            1 |
| A       | A       | NULL    |            1 |
| A       | A       | B       |            2 |
| A       | B       | C       |            3 |
+---------+---------+---------+--------------+

Upvotes: 1

Related Questions