Jin
Jin

Reputation: 1223

select multiple columns from different tables and join in hive

I have a hive table A with 5 columns, the first column(A.key) is the key and I want to keep all 5 columns. I want to select 2 columns from B, say B.key1 and B.key2 and 2 columns from C, say C.key1 and C.key2. I want to join these columns with A.key = B.key1 and B.key2 = C.key1

What I want is a new external table D that has the following columns. B.key2 and C.key2 values should be given NULL if no matching happened.

A.key, A_col1, A_col2, A_col3, A_col4, B.key2, C.key2 

What should be the correct hive query command? I got a max split error for my initial try.

Upvotes: 0

Views: 4097

Answers (1)

Joe K
Joe K

Reputation: 18424

Does this work?

create external table D as
select A.key, A.col1, A.col2, A.col3, A.col4, B.key2, C.key2
from A left outer join B on A.key = B.key1 left outer join C on A.key = C.key2;

If not, could you post more info about the "max split error" you mentioned? Copy+paste specific error message text is good.

Upvotes: 2

Related Questions