Left outer join query with Null check in hive is not working

Question

I have written a query to find records which exists in one table but not in the other. Here is the query

CREATE TABLE kai_prod.Held1Miss AS 
SELECT distinct jndheld.icerecordid from 
(
SELECT held1.icerecordid,held2.icerecordid icerecordid2 
FROM kai_prod.tblheld116 held1 LEFT OUTER JOIN
kai_prod.tblheld216 held2 ON
held1.icerecordid = held2.icerecordid
) jndheld
WHERE jndheld.icerecordid2 IS NULL;

I am getting a count of 266585 in kai_prod.Held1Miss

however when i double check using the below query , I found that the above count is wrong . The above count should have been zero

select count(*) from kai_prod.tblheld116 INNER JOIN
kai_prod.Held1Miss ON
tblheld116.icerecordid = Held1Miss.icerecordid;

266585

is this a bug in hive or issue with NUll?

please find the version details.

We are running hadoop with spark on Microsoft HDInsight distribution in Azure

hive --version Hive 1.2.1000.2.6.2.3-1 Subversion git://ctr-e134-1499953498516-137771-01-000004.hwx.site/grid/0/jenkins/workspace/HDP-parallel-ubuntu16/SOURCES/hive -r 4e24791861b0ce72f1185e572ffe1fd7f737dfdd Compiled by jenkins on Thu Sep 14 08:04:15 UTC 2017 From source with checksum 77da03c67efbe84d5e5b69bd4d91ad54

hadoop version Hadoop 2.7.3.2.6.2.3-1 Subversion git@github.com:hortonworks/hadoop.git -r 1ceeb58bb3bb5904df0cbb7983389bcaf2ffd0b6 Compiled by jenkins on 2017-09-14T07:25Z Compiled with protoc 2.5.0 From source with checksum 90b73c4c185645c1f47b61f942230 This command was run using /usr/hdp/2.6.2.3-1/hadoop/hadoop-common-2.7.3.2.6.2.3-1.jar

Rahul · Accepted Answer

Left Outer Join will always produce rows from left table.

Please use inner join for getting required results.

Thank you

Left outer join query with Null check in hive is not working

Answers (2)

Related Questions