Kirk Broadhurst
Kirk Broadhurst

Reputation: 28698

Athena partition locations

I can view all the partitions on my table using

show partitions my_table

and I can see the location of a partition by using

describe formatted my_table partition (partition_col='value')

but I have a lot of partitions, and don't want to have to parse the output of describe formatted if it can be avoided.

Is there a way to get all partitions and their locations, in a single query?

Upvotes: 5

Views: 6147

Answers (3)

Dror
Dror

Reputation: 13051

Using boto3 (as of version 1.12.9) the following is returning the complete list:

glue_client = boto3.client("glue")
glue_paginator = glue_client.get_paginator("get_partitions")
pages_iter = glue_paginator.paginate(
    DatabaseName=db_name, TableName=table_name
)
res = []
for page in pages_iter:
    for partition in page["Partitions"]:
        res.append(
            {
                "Values": partition["Values"],
                "Location": partition["StorageDescriptor"]["Location"],
            }
        )

Upvotes: 0

Theo
Theo

Reputation: 132862

The cheapest way to get the locations of the partitions of a table is to use the GetPartitions call from the Glue API. It will list all partitions, their values and locations. You can try it out using the AWS CLI tool like this:

aws glue get-partitions --region us-somewhere-1 --database-name your_database --table-name the_table

Using SQL like SELECT DISTINCT partition_col, "$path" FROM the_table could be expensive since Athena unfortunately scans the whole table to produce the output (it could have just looked at the table metadata but that optimization does not seem to exist yet).

Upvotes: 5

Kirk Broadhurst
Kirk Broadhurst

Reputation: 28698

There's no built in or consistent way to get this information.

Assuming you know your partition column(s), you can get this information with a query like

select distinct partition_col, "$path" from my_table

Upvotes: 6

Related Questions