Reputation: 301
I have written a query to find records which exists in one table but not in the other. Here is the query
CREATE TABLE kai_prod.Held1Miss AS
SELECT distinct jndheld.icerecordid from
(
SELECT held1.icerecordid,held2.icerecordid icerecordid2
FROM kai_prod.tblheld116 held1 LEFT OUTER JOIN
kai_prod.tblheld216 held2 ON
held1.icerecordid = held2.icerecordid
) jndheld
WHERE jndheld.icerecordid2 IS NULL;
I am getting a count of 266585 in kai_prod.Held1Miss
however when i double check using the below query , I found that the above count is wrong . The above count should have been zero
select count(*) from kai_prod.tblheld116 INNER JOIN
kai_prod.Held1Miss ON
tblheld116.icerecordid = Held1Miss.icerecordid;
266585
is this a bug in hive or issue with NUll?
please find the version details.
We are running hadoop with spark on Microsoft HDInsight distribution in Azure
hive --version Hive 1.2.1000.2.6.2.3-1 Subversion git://ctr-e134-1499953498516-137771-01-000004.hwx.site/grid/0/jenkins/workspace/HDP-parallel-ubuntu16/SOURCES/hive -r 4e24791861b0ce72f1185e572ffe1fd7f737dfdd Compiled by jenkins on Thu Sep 14 08:04:15 UTC 2017 From source with checksum 77da03c67efbe84d5e5b69bd4d91ad54
hadoop version Hadoop 2.7.3.2.6.2.3-1 Subversion [email protected]:hortonworks/hadoop.git -r 1ceeb58bb3bb5904df0cbb7983389bcaf2ffd0b6 Compiled by jenkins on 2017-09-14T07:25Z Compiled with protoc 2.5.0 From source with checksum 90b73c4c185645c1f47b61f942230 This command was run using /usr/hdp/2.6.2.3-1/hadoop/hadoop-common-2.7.3.2.6.2.3-1.jar
Upvotes: 0
Views: 1120
Reputation: 11
i think your second sql maybe wrong:
select count(*) from kai_prod.tblheld116 INNER JOIN
kai_prod.Held1Miss ON
tblheld116.icerecordid = Held1Miss.icerecordid;
you should join the table kai_prod.tblheld216 rather than kai_prod.tblheld116 to double check the null key
Upvotes: 1
Reputation: 409
Left Outer Join will always produce rows from left table.
Please use inner join for getting required results.
Thank you
Upvotes: 1