Redshift DISTSTYLE KEY. Deciding whats the best column to define as KEY

Question

Well I recently got into this area of Redshift, trying to optimize disk usage and performance of my database, and having read lots of information on AWS about the topic, I still have some doubts. First of all, to my database structure. Per schema, I have 3 master tables, with 3 different IDs, these are now DISTSTLYE ALL tables, being small in size.

Each master table has different amounts of IDs,

the date table --> largest one (#1 most joined)

the store table --> medium one (#3 most joined)

the item table --> smallest one (#2 most joined)

Then I have my core table, which has needed combinations of these IDs to display additional information about them. Anyway, this table should be a DISTSTYLE KEY type, based on my knowledge. Well, which of the 3 IDs should I select to be my DIST KEY?

Whats the criteria for this decision? I understand that for joins I need to look at the Sort Key, well that has been understood and defined to the ID_date, because its the most joined table. So now, what about the distribution per node of this table?

I'm sorry if I'm rambling, I dont want to leave any information out. If I have, feel free to ask! Thanks for taking the time to read!

Redshift DISTSTYLE KEY. Deciding whats the best column to define as KEY

Answers (1)

Related Questions