Reputation: 24144
These are two tables below-
CREATE EXTERNAL TABLE IF NOT EXISTS Table1 (This is the MAIN table through which comparisons need to be made)
(
ITEM_ID BIGINT,
CREATED_TIME STRING,
BUYER_ID BIGINT
)
CREATE EXTERNAL TABLE IF NOT EXISTS Table2
(
USER_ID BIGINT,
PURCHASED_ITEM ARRAY<STRUCT<PRODUCT_ID: BIGINT,TIMESTAMPS:STRING>>
)
As BUYER_ID and USER_ID they both are same thing.
I need to find the total COUNT and all those BUYER_ID that are not there in Table2 by comparing from Table1. So I think it's a kind of Left Outer Join Query. I am new to HiveSql stuff so I am having problem to figure out what should be the actual syntax to do this in HiveQL. I wrote the below SQL Query. Can anyone tell me whether the SQL query below is fine or not to achieve my scenario?
SELECT COUNT(BUYER_ID), BUYER_ID
FROM Table1 dw
LEFT OUTER JOIN Table2 dps ON (dw.BUYER_ID = dps.USER_ID)
GROUP BY BUYER_ID;
Upvotes: 1
Views: 3364
Reputation: 77657
If I understand your requirements correctly, I think you are almost there. It seems you only need to add a condition checking if there's no match between the two tables:
SELECT COUNT(BUYER_ID), BUYER_ID
FROM Table1 dw
LEFT OUTER JOIN Table2 dps ON (dw.BUYER_ID = dps.USER_ID)
WHERE dps.USER_ID IS NULL
GROUP BY BUYER_ID;
The above will filter out BUYER_IDs that do have matches in Table2, and will show the remaining BUYER_IDs and their corresponding count values. (Well, that's what I understand you want.)
Upvotes: 2