plhn
plhn

Reputation: 5273

Hive gives a record by ‘distinct’ clause, but ‘count’ is 0

look at these.

0: jdbc:hive2> select distinct A_COL from A_TABLE where A_COL='1999-05-04';
+-------------+--+
|    A_COL    |
+-------------+--+
| 1999-05-04  |
+-------------+--+
1 row selected (6.127 seconds)

0: jdbc:hive2> select count(*) from A_TABLE where A_COL='1999-05-04';
+------+--+
| _c0  |
+------+--+
| 0    |
+------+--+
1 row selected (4.206 seconds)

0: jdbc:hive2> select * from A_TABLE where A_COL='1999-05-04';
+-------+---------------
| A_COL | blabla...
+-------+----------
+-------+----------
No rows selected (0.292 seconds)

I think the first query means ‘at least one record whose A_COL’s value is '1999-05-04' exists.’
But the second and third query deny that.

How can it be possible?

Upvotes: 2

Views: 455

Answers (1)

Ronak Patel
Ronak Patel

Reputation: 3849

I don't know if your table has partition on column A_COL or not but this can be the case. so the distinct is showing partition value for it but table has NO actual data:

hive> CREATE TABLE test_2(B_COL string) PARTITIONED BY (A_COL string);
OK
Time taken: 1.182 seconds

hive> ALTER TABLE test_2 ADD PARTITION (A_COL='2016-07-27');
OK
Time taken: 0.285 seconds

hive> select count(*) from test_2 where A_COL='2016-07-27';
OK
0
Time taken: 35.016 seconds, Fetched: 1 row(s)

hive> select * from test_2 where A_COL='2016-07-27';
OK
Time taken: 0.165 seconds

hive> select distinct(A_COL) from test_2;
2016-07-27
Time taken: 24.528 seconds, Fetched: 1 row(s)

Upvotes: 1

Related Questions