arsenal
arsenal

Reputation: 24144

Join two tables using HiveQL

These are two tables below-

CREATE EXTERNAL TABLE IF NOT EXISTS Table1   (This is the MAIN table through which     comparisons need to be made) 
( 
ITEM_ID BIGINT, 
CREATED_TIME STRING, 
BUYER_ID BIGINT 
) 

CREATE EXTERNAL TABLE IF NOT EXISTS Table2 
( 
USER_ID BIGINT, 
PURCHASED_ITEM ARRAY<STRUCT<PRODUCT_ID: BIGINT,TIMESTAMPS:STRING>> 
) 

As BUYER_ID and USER_ID they both are same thing.

I need to find the total COUNT and all those BUYER_ID that are not there in Table2 by comparing from Table1. So I think it's a kind of Left Outer Join Query. I am new to HiveSql stuff so I am having problem to figure out what should be the actual syntax to do this in HiveQL. I wrote the below SQL Query. Can anyone tell me whether the SQL query below is fine or not to achieve my scenario?

SELECT COUNT(BUYER_ID), BUYER_ID 
FROM Table1 dw 
LEFT OUTER JOIN Table2 dps ON (dw.BUYER_ID = dps.USER_ID) 
GROUP BY BUYER_ID;

Upvotes: 1

Views: 3364

Answers (1)

Andriy M
Andriy M

Reputation: 77657

If I understand your requirements correctly, I think you are almost there. It seems you only need to add a condition checking if there's no match between the two tables:

SELECT COUNT(BUYER_ID), BUYER_ID 
FROM Table1 dw 
LEFT OUTER JOIN Table2 dps ON (dw.BUYER_ID = dps.USER_ID) 
WHERE dps.USER_ID IS NULL
GROUP BY BUYER_ID;

The above will filter out BUYER_IDs that do have matches in Table2, and will show the remaining BUYER_IDs and their corresponding count values. (Well, that's what I understand you want.)

Upvotes: 2

Related Questions