Reputation: 1079
I'm a noobie to Hive. My question is why we need to use collect_set(col) while performing GROUP BY?
select singer, collect_set(song) from songlist GROUP BY singer;;
would really appreciate any help. Thanks in advance!
Upvotes: 0
Views: 104
Reputation: 690
Dude!! It is the other way around :)
All Summation/aggregation things need a group by. In your query, while you are trying to do a collect_set(col) you require a group by for it.
So In your case you are trying to group all songs sung by a singer. Hence the group by for the collect_set(songs)
Upvotes: 1