Rodrigo Ney
Rodrigo Ney

Reputation: 373

Redshift Spectrum partitioning a table using two date fields

I was searching for best practices to create partitions by date, using amazon-redshift-spectrum, but the examples shows the problem being solved by partitioning the table by one date only. What to do if I have more than one date field?

Eg: Mobile events with user_install_date and event_date

How performative is to partition your s3 like:

   installdate=2015-01-01/eventdate=2017-01-01
   installdate=2015-01-01/eventdate=2017-01-02
   installdate=2015-01-01/eventdate=2017-01-03

Will It kill my select performance ? What is the best strategy in this case?

Upvotes: 1

Views: 1170

Answers (1)

John Rotenstein
John Rotenstein

Reputation: 269151

If you data was partitioned in the above manner, then a query that merely had eventdate in the WHERE clause (without installdate) would be less efficient.

It would still need to look through every installdate directory, but it could skip over eventdate directories that do not match the predicate.

Put the less-used parameter second.

Upvotes: 1

Related Questions