Reputation: 75
I am using HIVE with two tables looking like (more or less):
-TABLE1 defined as [(Variables : string),(Value1 : int),(Value2 : int)]
with field "Variables" looking like "x0,x1,x2,x3,...,xn"
-TABLE2 define as [(Value1Sum : int),(Value2Sum : int),(X1 : string),(X4 : string),(X17 : string)]
I "convert" table1 to table2 with the query :
INSERT OVERWRITE TABLE table2
SELECT sum(v1), sum(v2), x1, x4, x17
FROM (SELECT
Value1 as v1,
Value2 as v2,
split(Variables, ",")[1] as x1,
split(Variables, ",")[4] as x4,
split(Variables, ",")[17] as x17
FROM Table1) tmp
GROUP BY tmp.x1, tmp.x4, tmp.x17
Does Hive call 3 times the split function ?
Is there a way to make it more elegant ?
Is there a way to make it more generic ?
Best regards, CC
Upvotes: 3
Views: 8730
Reputation: 8269
Yes it will call split each time. You can make it slightly more elegant:
Why not define Variables as an array column to start with? They you can access elements directly:
select Varaibles[1] from table1
I'm assuming you're using an external table, so you can do that like so:
create external table table1(variables array<string>, a int, b int)
ROW FORMAT DELIMITED
COLLECTION ITEMS TERMINATED BY ','
LOCATION 'hdfs://somewhere'
Upvotes: 3