Reputation: 121
I have a very large volatile table. I set the primary index to customer_id, which is the most unique column on the table. However, I can make the primary index more unique by combining it with customer_id, effective_dt, and group_id.
I can make the primary index even more unique by adding more fields, but I was wondering if there is a point of diminishing return for adding more columns?
Also, I noticed that as the more columns I add a unique primary index, the longer it takes to create the volatile table. Just my observation, but I am not sure if this is true?
Thanks in advance.
Upvotes: 0
Views: 529
Reputation: 136
Am I wrong in thinking that if your table is unique at customer level, then adding also the date field doesn't make it anymore unique?
Upvotes: 0
Reputation: 60482
Basically the PI of a table should be based on following criteria (in order of importance):
Access: Pick a column (or column combination) which is accessed a lot. In best case used for joins (i.e. a FK) plus possibly for WHERE-conditions based on equality
Distribution: The table should not be skewed too much (much might be defined differently for different customers). Usually skew up to 10-30% (AMP with maximum size for this table has 10-30% more data than average) is acceptable. Query dbc.TableSizeV to get the actual Permspace used on each AMP for a table.
Volatility: PI should be stable, not lots of updates.
The Primary Key of a table perfectly fits for two out of three, but might not fit the most important one.
In your case customer_id
sounds like a column which will be joined and has a good distribution, so it's probably better than adding more columns to the PI.
The time for creating a Volatile Table should hardly differ with different PIs (unless one is skewed) and I would expect more unique PIs to be a bit more efficient.
Upvotes: 1