Raj
Raj

Reputation: 149

Hive: Joining two tables with different keys

I have two tables like below. Basically i want to join both of them and expected the result like below.

First 3 rows of table 2 does not have any activity id just empty.

All fields are tab separated. Category "33" is having three description as per table 2.

We need to make use of "Activity ID" to get the result for "33" category as there are 3 values for that.

could anyone tell me how to achieve this output?

TABLE: 1

Empid Category ActivityID

44126 33 TRAIN

44127 10 UFL

44128 12 TOI

44129 33 UNASSIGNED

44130 15 MICROSOFT

44131 33 BENEFITS

44132 43 BENEFITS

TABLE 2:

Category ActivityID Categdesc

10 billable

12 billable

15 Non-billable

33 TRAIN Training

33 UNASSIGNED Bench

33 BENEFITS Benefits

43 Benefits

Expected Output:

44126 33 Training

44127 10 Billable

44128 12 Billable

44129 33 Bench

44130 15 Non-billable

44131 33 Benefits

44132 43 Benefits

Upvotes: 0

Views: 3616

Answers (2)

Rags
Rags

Reputation: 1881

It's little difficult to do this Hive as there are many limitations. This is how I solved it but there could be a better way.

I named your tables as below. Table1 = EmpActivity Table2 = ActivityMas

The challenge comes due to the null fields in Table2. I created a view and Used UNION to combine result from two distinct queries.

Create view actView AS Select * from ActivityMas Where Activityid ='';

SELECT * From (
Select EmpActivity.EmpId, EmpActivity.Category, ActivityMas.categdesc
from EmpActivity JOIN ActivityMas 
ON  EmpActivity.Category =  ActivityMas.Category 
AND EmpActivity.ActivityId = ActivityMas.ActivityId
UNION ALL
Select EmpActivity.EmpId, EmpActivity.Category, ActView.categdesc from EmpActivity
JOIN ActView ON  EmpActivity.Category =  ActView.Category
)

You have to use top level SELECT clause as the UNION ALL is not directly supported from top level statements. This will run total 3 MR jobs. ANd below is the result I got.

44127   10      billable
44128   12      billable
44130   15      Non-billable
44132   43      Benefits
44131   33      Benefits
44126   33      Training
44129   33      Bench

Upvotes: 1

Brenden Brown
Brenden Brown

Reputation: 3215

I'm not sure if I understand your question or your data, but would this work?

select table1.empid, table1.category, table2.categdesc
from table1 join table2
on table1.activityID = table2.activityID;

Upvotes: 1

Related Questions