sunil kancharlapalli
sunil kancharlapalli

Reputation: 105

How to get distinct rows in hive if a column is blank

I am using hive to generate my output and trying to get distinct rows in my output.

I am using select distinct c1,c2,c3,c4 from table.

I have the below rows in the table as example

C1       C2    C3    C4

Value1 value2      value4

Value1 value2      value4

here, the column C3 is a blank and the expected output is just a single column among those. But even after using the distinct keyword, I am not able to get a single row as output. I am seeing both the records in the output.

Please help me if anyone have faced this issue before and possible solution if you have one.

Upvotes: 0

Views: 1430

Answers (2)

sumitya
sumitya

Reputation: 2691

Yes this is the case ,which is hard to debug. Column C3 might contain value which is neither null even nor ''. you should consider casting column C3.

select distinct col1,cast(col2 as string),col3 from test_nulls;

Upvotes: 0

Nishu Tayal
Nishu Tayal

Reputation: 20830

The concept of DISTINCT is : It specifies removal of duplicate rows from the result set.

Let's say, you are using distinct on multiple columns,It will consider each column unique value.

Example:

hive> SELECT col1, col2 FROM t1
    1 3
    1 3
    1 4
    2 5
hive> SELECT DISTINCT col1, col2 FROM t1
    1 3
    1 4
    2 5
hive> SELECT DISTINCT col1 FROM t1
    1
    2

In your case, there may be possibiity that any column can have different values, and that's why you are getting 2 rows as output.

Please refer the HIVE DISTINCT

Upvotes: 1

Related Questions