user4119502
user4119502

Reputation:

Join multiple tables in Hive

Below is the data set

Table1
col1,col2
key1,k1
key2,k2
key3,k3


Table2
col1,col3
key1,k11
key2,k22
key4,k44


Table3
col1,col4
key1,k111
key2,k222
key5,k555

I need to join the 3 tables based on col1. Below is my query

select a.col1,a.col2,b.col3,c.col4 from table1 a full outer join table2 b full outer join table3 c;

The expected output is shown below

Expected output:

col1,col2,col3,col4
key1,k1,k11,k111
key2,k2,k22,k222
key3,k3,   ,
key4,  ,k44,
key5,  ,   ,k555

Upvotes: 1

Views: 11219

Answers (2)

Ambrish
Ambrish

Reputation: 3677

Since you are doing full outer join, it will be a good idea to do COALESCE for column col1

Something like:

select 
    COALESCE(a.col1, b.col1, c.col1) as col1,
    a.col2,
    b.col3,
    c.col4
from 
    table1 a 
    full outer join table2 b 
      on t1.col1 = t2.col1
    full outer join table3 c
      on t1.col1 = t3.col1;

We need to have join condition as well to avoid cartesian product in the result set.

Upvotes: 2

Andrew
Andrew

Reputation: 8758

  1. Based on your desired result, you need to join all your tables on col1.
  2. To ensure that you always have a value in col1, you need to coalesce them together (coalesce gives you the first non-null value).
  3. To stitch your other columns together (and replace nulls with an empty space, combine coalesce with concat:

Putting that all together:

select
coalesce(t1.col1,t2.col1,t3.col1) as col1,
concat(coalesce(t1.col2,' '),',',coalesce(t2.col2,' '),',',coalesce(t3.col2,' '))
from
table1 t1
full join table2 t2
  on t1.col1 = t2.col1
full join table3 t3
  on t1.col1 = t3.col1

Upvotes: 3

Related Questions