Redshift: Aggregate data on large number of dimensions is slow

Question

I have an Amazon redshift table with about 400M records and 100 columns - 80 dimensions and 20 metrics.

Table is distributed by 1 of the high cardinality dimension columns and includes a couple of high cardinality columns in sort key.

A simple aggregate query:

Select dim1, dim2...dim60, sum(met1),...sum(met15)
From my table
Group by dim1...dim60

is taking too long. The explain plan looks simple just a sequential scan and hashaggregate on the able. Any recommendations on how I can optimize it?

AlexYes · Accepted Answer

1) If your table is heavily denormalized (your 80 dimensions are in fact 20 dimensions with 4 attributes each) it is faster to group by dimension keys only, and if you really need all dimension attributes join the aggregated result back to dimension tables to get them, like this:

with 
groups as (
    select dim1_id,dim2_id,...,dim20_id,sum(met1),sum(met2)
    from my_table
    group by 1,2,...,20
)
select *
from groups
join dim1_table
using (dim1_id)
join dim2_table
using (dim2_id)
...
join dim20_table
using (dim20_id)

If you don't want to normalize your table and you like that a single row has all pieces of information it's fine to keep it as is since in a column database they won't slow the queries down if you don't use them. But grouping by 80 columns is definitely inefficient and has to be "pseudo-normalized" in the query.

2) if your dimensions are hierarchical you can group by the lowest level only and then join higher level dimension attributes. For example, if you have country, country region and city with 4 attributes each there's no need to group by 12 attributes, all you can do is group by city ID and then join city's attributes, country region and country tables to the city ID of each group

3) you can have the combination of dimension IDs with some delimiter like - in a separate varchar column and use that as a sort key

Redshift: Aggregate data on large number of dimensions is slow

Answers (2)

Related Questions