Reputation: 605

How to effectively implement GROUP BY combination of columns in HiveQL?

For example, I have the following table named Roll:

ID    Name    Address
---------------------
01    Lily    NewYork
02    Lucy    NewYork
03    Lucy    NewYork

and I want to get COUNT(1) GROUP BY combination of column Name and Address:

SELECT Name, Address, COUNT(1) FROM Roll GROUP BY Name, Address
+
SELECT Name, COUNT(1) FROM Roll GROUP BY Name
+
SELEC Address, COUNT(1) FROM Roll GROUP BY Address
+
SELECT COUNT(1) FROM Roll

The following SQL can realize my idea and '##' represents 'GROUP BY NONE':

SELECT Name, Address, COUNT(1) FROM (
SELECT Name, Address FROM Roll
UNION ALL
SELECT '##', Address FROM Roll
UNION ALL
SELECT Name, '##' FROM Roll
UNION ALL
SELECT '##', '##' FROM Roll) t
GROUP by Name, Address;

The result:

+------+---------+----------+
| Name | Address | COUNT(1) |
+------+---------+----------+
| ##   | ##      |        3 |
| ##   | NewYork |        3 |
| Lily | ##      |        1 |
| Lily | NewYork |        1 |
| Lucy | ##      |        2 |
| Lucy | NewYork |        2 |
+------+---------+----------+

Is the more efficient way to implement it except the above one?

Thanks.

Upvotes: 1

Answers (2)

Partha Kaushik

Reputation: 690

SELECT coalesce(Name,"##"), coalesce(Address,"##"), count(1)
FROM ROLL
GROUP BY Name, Address with cube;

I guess this is what you were looking for :)

Upvotes: 0

Angelo Di Donato

Reputation: 1093

are you looking for sub-totals? if so that can be achieved with grouping sets and cube/rollup. check this wiki about grouping

Upvotes: 1

How to effectively implement GROUP BY combination of columns in HiveQL?

Answers (2)

Related Questions