Reputation: 373
Is there a build in function for this in Redshift? Interleaved keys sort of mitigate this somehow. If there can be mix of Sort by Compound then Interleaved then this would be ideal.
Is there a disadvantage of using Partitioned view. Example what I mean as below.
https://sqlsunday.com/2014/08/31/partitioned-views/
However instead of using date, I want to partition by other field type. e.g. Product Type or Country.
so the view will be
Select '1' as Prod_type, 'USA' as Country , * from fact_Sales_1_USA union all Select '2' as Prod_type, 'UK' as Country , * from fact_Sales_2_UK
This way it will prevent query from reading the whole table and just limit to the particular Product Type and Region.
Also is there function to select multiple tables with similar name.
So instead of the view above It will be just, but this requires the partitioned column to be included in the table.
Select * from fact_*
One of disadvantage is our course the update of data slightly more complex?
Will it slow down a query where I don't use the field above?
Upvotes: 1
Views: 1480
Reputation: 4208
There is no built in function for partitioned views in Redshift. Its developers believe that sort and dist key model and encoding allows to handle most of performance issues. Particular solution is heavily dependent on nature of your data and typical queries that you do.
In a specific example that you mentioned, having multiple tables based on product type/country and sorted by some other column won't prevent query planner from scanning all tables if product type/country are not in the sort key. AFAIK Redshift is agnostic about values distribution of columns that are don't participate in the sort/dist keys, so it will scan each table that participates in the union even if the values needed are just in one table by design. If you're going to filter frequently by product type AND country, create a compound sort key (product_type,country,timestamp)
. This way Redshift will scan only blocks of data that satisfy the search condition. If you're going to filter frequently by product type OR country, create an interleaved sort key on the same columns. Interleaved key is better here because if you filter table with compound sort key mentioned above by country (second column) the key won't be used.
If this doesn't alleviate the problem it means your data is so big that your company can afford building an application on top of Redshift that will handle this logic outside SQL (keep different product type / country in separate tables and generate dynamic SQL that queries these specific tables based on the filter value).
As for the function to select from tables with a similar name, there is no such. In Postgres you could query system tables, generate SQL based on the query results and run it right away but Redshift doesn't support dynamic SQL. The solution is above (right sort keys or application on top of Redshift).
Upvotes: 1