Reputation: 2977
Is there a benefit of using multiple columns on distribution when creating a table? For instance:
CREATE TABLE data_facts (
data_id int primary key,
channel_id smallint,
chart_id smallint,
demo_id smallint,
value numeric)
DISTRIBUTED BY (
channel_id,
chart_id,
demo_id)
as there will be chance I need join data_facts
with three different tables channel
, chart
and demo
using channel_id
, chart_id
and demo_id
respectively.
Specifically,
distribution
and include all id(s)
that I'm using for joining in terms of efficiency?id(s)
matter?Thanks!
Upvotes: 4
Views: 9203
Reputation: 21
No. Multiple hash keys do not provide benefits except when you are doing a hash distribution AND a single key does not provide a reasonably even distribution.
Co-located joins will occur under the following conditions:
If both conditions are not met, one of the tables will be broadcast across all computing nodes, resulting in performance problems and occasional query failure.
The practicality of using distribution has a lot to do with your modeling technique. It is more effective in star schema than 3NF structures. In star schemas it is common to distribute facts on the key of its largest dimension table. Colocation will occur if you join to that dimension. This gives you the greatest benefit since this would be the most time consuming join. Typically, the smaller dimension tables (in the query) wind up being broadcast and held in memory for the join.
Upvotes: 2
Reputation: 72646
It depends on how much you want to shard the database, and on how less records you want to distribute in each partition, I mean if you add more than one column in the distribution you will fragment a lot more the data into more partitions.
It also depends if you shard by modulo or hash ...
However, in my opinion, if you have a multiple columns primary key and you want to shard by this primary key could have a sense distributing by multiple columns(with all the columns in the primary key) otherwise you should shard by a single column that in most cases is enough .
Upvotes: 2