Reputation: 31
I have a table in the following format
ID Property Value
1 name Tim
1 location USA
1 age 30
2 name Jack
2 location UK
2 age 27
And I would like an output in the following format
ID name location age
1 Tim USA 30
2 Jack UK 27
In python I can do
table_agg = table.groupby('ID')[['Property','Value']].apply(lambda x: dict(x.values))
p = pd.DataFrame(list(table_agg))
How to write the query in Hive?
Upvotes: 2
Views: 99
Reputation: 31540
You can use collect_list,map functions to group the data then access the array
based on key.
Example:
hive> create table t1(id int,property string,valu string) stored as orc;
hive> insert into t1 values(1,"name","Tim"),(1,"location","USA"),(1,"age","30"),(2,"name","Jack"),(2,"location","UK"),(2,"age","27");
hive> select id,
va[0]["name"]name,
va[1]["location"]location,
va[2]["age"]age
from (
select id,collect_list(map(property,value))va
from <table_name> group by id
)t;
Result:
id name location age
1 Tim USA 30
2 Jack UK 27
Upvotes: 2