Reputation: 105
I am using hive to generate my output and trying to get distinct rows in my output.
I am using select distinct c1,c2,c3,c4 from table
.
I have the below rows in the table as example
C1 C2 C3 C4
Value1 value2 value4
Value1 value2 value4
here, the column C3 is a blank and the expected output is just a single column among those. But even after using the distinct keyword, I am not able to get a single row as output. I am seeing both the records in the output.
Please help me if anyone have faced this issue before and possible solution if you have one.
Upvotes: 0
Views: 1430
Reputation: 2691
Yes this is the case ,which is hard to debug. Column C3 might contain value which is neither null even nor ''. you should consider casting
column C3.
select distinct col1,cast(col2 as string),col3 from test_nulls;
Upvotes: 0
Reputation: 20830
The concept of DISTINCT is : It specifies removal of duplicate rows from the result set.
Let's say, you are using distinct on multiple columns,It will consider each column unique value.
Example:
hive> SELECT col1, col2 FROM t1
1 3
1 3
1 4
2 5
hive> SELECT DISTINCT col1, col2 FROM t1
1 3
1 4
2 5
hive> SELECT DISTINCT col1 FROM t1
1
2
In your case, there may be possibiity that any column can have different values, and that's why you are getting 2 rows as output.
Please refer the HIVE DISTINCT
Upvotes: 1