Cube Operators on Impala

Question

In doing benchmarks between Impala and PrestoDB, we noticed that building pivot tables is quite difficult in Imapala because it does not have Cube operators, like Presto does. Here are two examples in Presto:

The CUBE operator generates all possible grouping sets (i.e. a power set) for a given set of columns. For example, the query:`

SELECT origin_state, destination_state, sum(package_weight)
FROM shipping
GROUP BY CUBE (origin_state, destination_state);

is equivalent to:

SELECT origin_state, destination_state, sum(package_weight)
FROM shipping
GROUP BY GROUPING SETS (
 (origin_state, destination_state),
 (origin_state),
 (destination_state),
 ());

Another example is the ROLLUP operator. Full documentation is here: https://prestodb.io/docs/current/sql/select.html.

It's not syntatic sugar because PRESTO perform one table scan for whole query - so using this operators you can build pivot table in one request Impala need to run 2-3 queries.

Is there a way in which we can do this with one query / table-scan in Impala instaead of 3? Otherwise the performance becomes terrible on creating any type of pivot table.

Kanagaraj Dhanapal · Accepted Answer

We can use impala windo functions but instead of single column output, you will get 3 columns.

SELECT origin_state,
        destination_state,
        SUM(package_weight) OVER (PARTITION BY origin_state, destination_state) AS pkgwgrbyorganddest,
        SUM(package_weight) OVER (PARTITION BY origin_state) AS pkgwgrbyorg,
        SUM(package_weight) OVER (PARTITION BY destination_state) AS pkgwgrbydest
 FROM shipping;

Cube Operators on Impala

Answers (1)

Related Questions