lgao
lgao

Reputation: 31

Aggregate ID-property-value records in Hive

I have a table in the following format

ID  Property  Value

1    name      Tim

1    location  USA

1    age       30

2    name      Jack

2    location  UK

2    age       27

And I would like an output in the following format

ID  name  location age

1   Tim    USA     30

2   Jack   UK      27

In python I can do

table_agg = table.groupby('ID')[['Property','Value']].apply(lambda x: dict(x.values))

p = pd.DataFrame(list(table_agg))

How to write the query in Hive?

Upvotes: 2

Views: 99

Answers (1)

notNull
notNull

Reputation: 31540

You can use collect_list,map functions to group the data then access the array based on key.

Example:

hive> create table t1(id int,property string,valu string) stored as orc;
hive> insert into t1 values(1,"name","Tim"),(1,"location","USA"),(1,"age","30"),(2,"name","Jack"),(2,"location","UK"),(2,"age","27");

hive> select id,
       va[0]["name"]name,
       va[1]["location"]location,
       va[2]["age"]age 
      from (
           select id,collect_list(map(property,value))va 
               from <table_name> group by id
          )t;

Result:

id      name    location        age
1       Tim     USA             30
2       Jack    UK              27

Upvotes: 2

Related Questions