Running out of space in join queries

Question

I have a star schema in Redshift and for some BI purpose I am trying to create a flat table by joining the fact table with dimension table. Both table are huge, the fact table is around 1TB and the dimension table is around 10GB.

When I run a join query the query fails, even when I can confirm that there is space in the redshift cluster. Temporarily to complete the process I am running the join by adding one column at a time.

In my understanding while the join query is running the space requirement is quite high, once the join complete the space comes down.

Can anyone suggest an efficient way to complete such join?

Guy · Accepted Answer

You can allocate more memory for a query with setting wlm_query_slot_count (http://docs.aws.amazon.com/redshift/latest/dg/r_wlm_query_slot_count.html) to a higher value.

Also check if it makes sense for you to have your dimension tables replicated across all your nodes with DIST_ALL (http://docs.aws.amazon.com/redshift/latest/dg/c_choosing_dist_sort.html). It will take more disk space, but it will speed up the join queries.

Another option is to flatten the large dimension into the fact table, like you might be doing in other DWH schema

Running out of space in join queries

Answers (1)

Related Questions