J Anderson
J Anderson

Reputation: 31

redshift partition external table by part of a string

I haven't seen an answer after days of searching - so here goes

I have an Athena database with a table foo. Adding it to Redshift I use this command:

create external schema athena_schema from data catalog 
database 'my-catalog-db' 
iam_role '...role/my_redshift_role';

my table foo has 45 fields and one of them is a timestamp stored as a string. I want to partition the data in the table by JUST the date portion of the string.

The string looks like '2021/02/09 20:10:09:001', lets call it mydate

So I've tried this:

alter table athena_schema.foo
partition(left(mydate, 10) = '2021/02/09')
location 's3://my s3 location/foo_2021_02_09/';

And Redshift doesn't like the substring commands on an existing field. I've tried them. Any ideas? Thanks for your time.

Upvotes: 0

Views: 4660

Answers (1)

Joe Harris
Joe Harris

Reputation: 14035

When you define a partition in a Redshift Spectrum (and Athena) external table the partition column becomes a separate column in your table. That means that you cannot map a partition onto a column that also exists in the table data file.

In the example DDL from "Partitioning Redshift Spectrum external tables" you can see that the partition column saledate is added as another column in the table.

CREATE EXTERNAL TABLE spectrum.sales_part (
      salesid     INTEGER
    , listid      INTEGER
    , sellerid    INTEGER
    , buyerid     INTEGER
    , eventid     INTEGER
    , dateid      SMALLINT
    , qtysold     SMALLINT
    , pricepaid   DECIMAL(8,2)
    , commission  DECIMAL(8,2)
    , saletime    TIMESTAMP               )
PARTITIONED BY (saledate CHAR(10))
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 's3://awssampledbuswest2/tickit/spectrum/sales_partition/'
TABLE PROPERTIES ('numRows'='172000');

--Add partitions
ALTER TABLE spectrum.sales_part ADD
PARTITION (saledate='2008-01') LOCATION 's3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-01/'
PARTITION (saledate='2008-02') LOCATION 's3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-02/'
PARTITION (saledate='2008-03') LOCATION 's3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-03/';

--Query using partition column `saledate`
SELECT TOP 5 
       spectrum.sales_part.eventid
     , SUM(spectrum.sales_part.pricepaid)
FROM spectrum.sales_part, event
WHERE spectrum.sales_part.eventid = event.eventid
  AND spectrum.sales_part.pricepaid > 30
  AND saledate = '2008-01'
GROUP BY spectrum.sales_part.eventid
ORDER BY 2 DESC;

Upvotes: 1

Related Questions