Reputation: 4707
I have the following partitioned table in Athena (HIVE/Presto):
CREATE EXTERNAL TABLE IF NOT EXISTS mydb.mytable (
id STRING,
data STRING
)
PARTITIONED BY (
year string,
month string,
day string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
LOCATION 's3://mybucket';
Data is stored in s3 organized in a path structure like s3://mybucket/year=2020/month=01/day=30/
.
I would like to know if the following query would leverage partitioning optimization:
SELECT
*
FROM
mydb.mytable
WHERE
(year='2020' AND month='08' AND day IN ('10', '11', '12')) OR
(year='2020' AND month='07' AND day IN ('29', '30', '31'));
I am assuming since IN
operator will be transformed in a series of OR
conditions, this will be still a query which will benefit by partitioning. Am I correct?
Upvotes: 1
Views: 1989
Reputation: 132862
Unfortunately Athena does not expose information that would make it easier to understand how to optimise queries. Currently the only thing you can do is to run different variations of queries and look at the statistics returned in the GetQueryExecution
API call.
One way to figure out if Athena will make use of partitioning in a query is to run the query with different values for the partition column and make sure that the amount of data scanned is different. If the amount of data is different Athena was able to prune partitions during query planning.
Upvotes: 1
Reputation: 2936
Yes, it's also mentioned int the documentation.
When Athena runs a query on a partitioned table, it checks to see if any partitioned columns are used in the WHERE clause of the query. If partitioned columns are used, Athena requests the AWS Glue Data Catalog to return the partition specification matching the specified partition columns. The partition specification includes the LOCATION property that tells Athena which Amazon S3 prefix to use when reading data. In this case, only data stored in this prefix is scanned. If you do not use partitioned columns in the WHERE clause, Athena scans all the files that belong to the table's partitions.
Upvotes: 1