Reputation: 17
I'm new to hive and trying to update a table (table_A) with a new column based on the values retrieved from another table (table_B). table_B is used as a lookup table for the categorical variable descriptions.
Ex: table_A
index | field_x | field_y |
---|---|---|
1 | L | S |
2 | H | H |
3 | M | S |
Ex: table_B
variable_name | variable_category | variable_category_description |
---|---|---|
field_x | L | Low |
field_x | M | Medium |
field_x | H | High |
field_y | S | Soft |
field_y | H | Hard |
Based on the table_A and table_B I need to obtain following table.
index | field_x | field_x_description | field_y | field_y_description |
---|---|---|---|---|
1 | L | Low | S | Soft |
2 | H | High | H | Hard |
3 | M | Medium | S | Soft |
I tried following in hue hive editor just to add one column at the beginning. My editor does not support update statements.
CREATE TABLE table_C AS
SELECT index,
field_x,
field_y,
(SELECT variable_category_description
FROM table_B
where table_B.variable_name = 'field_x' and table_B.variable_category= table_A.field_x
AS field_x_description
FROM table_A
I get following error
Error while compiling statement: FAILED: ParseException line 5:5 cannot recognize input near 'SELECT' 'variable_category_description' 'FROM' in expression specification
A fix for this is much appreciated.
Upvotes: 1
Views: 200
Reputation: 38325
You need two joins (INNER or LEFT) with table_B
CREATE TABLE table_C AS
SELECT a.index,
a.field_x,
x.variable_category_description AS field_x_description,
a.field_y,
y.variable_category_description AS field_y_description
FROM table_A
LEFT JOIN table_B x ON x.variable_name = 'field_x'
and x.variable_category = a.field_x
LEFT JOIN table_B y ON y.variable_name = 'field_y'
and y.variable_category = a.field_y
;
Upvotes: 2