Reputation: 31
I haven't seen an answer after days of searching - so here goes
I have an Athena database with a table foo. Adding it to Redshift I use this command:
create external schema athena_schema from data catalog
database 'my-catalog-db'
iam_role '...role/my_redshift_role';
my table foo has 45 fields and one of them is a timestamp stored as a string. I want to partition the data in the table by JUST the date portion of the string.
The string looks like '2021/02/09 20:10:09:001', lets call it mydate
So I've tried this:
alter table athena_schema.foo
partition(left(mydate, 10) = '2021/02/09')
location 's3://my s3 location/foo_2021_02_09/';
And Redshift doesn't like the substring commands on an existing field. I've tried them. Any ideas? Thanks for your time.
Upvotes: 0
Views: 4660
Reputation: 14035
When you define a partition in a Redshift Spectrum (and Athena) external table the partition column becomes a separate column in your table. That means that you cannot map a partition onto a column that also exists in the table data file.
In the example DDL from "Partitioning Redshift Spectrum external tables" you can see that the partition column saledate
is added as another column in the table.
CREATE EXTERNAL TABLE spectrum.sales_part (
salesid INTEGER
, listid INTEGER
, sellerid INTEGER
, buyerid INTEGER
, eventid INTEGER
, dateid SMALLINT
, qtysold SMALLINT
, pricepaid DECIMAL(8,2)
, commission DECIMAL(8,2)
, saletime TIMESTAMP )
PARTITIONED BY (saledate CHAR(10))
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 's3://awssampledbuswest2/tickit/spectrum/sales_partition/'
TABLE PROPERTIES ('numRows'='172000');
--Add partitions
ALTER TABLE spectrum.sales_part ADD
PARTITION (saledate='2008-01') LOCATION 's3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-01/'
PARTITION (saledate='2008-02') LOCATION 's3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-02/'
PARTITION (saledate='2008-03') LOCATION 's3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-03/';
--Query using partition column `saledate`
SELECT TOP 5
spectrum.sales_part.eventid
, SUM(spectrum.sales_part.pricepaid)
FROM spectrum.sales_part, event
WHERE spectrum.sales_part.eventid = event.eventid
AND spectrum.sales_part.pricepaid > 30
AND saledate = '2008-01'
GROUP BY spectrum.sales_part.eventid
ORDER BY 2 DESC;
Upvotes: 1