Reputation: 28698
I can view all the partitions on my table using
show partitions my_table
and I can see the location of a partition by using
describe formatted my_table partition (partition_col='value')
but I have a lot of partitions, and don't want to have to parse the output of describe formatted
if it can be avoided.
Is there a way to get all partitions and their locations, in a single query?
Upvotes: 5
Views: 6147
Reputation: 13051
Using boto3
(as of version 1.12.9) the following is returning the complete list:
glue_client = boto3.client("glue")
glue_paginator = glue_client.get_paginator("get_partitions")
pages_iter = glue_paginator.paginate(
DatabaseName=db_name, TableName=table_name
)
res = []
for page in pages_iter:
for partition in page["Partitions"]:
res.append(
{
"Values": partition["Values"],
"Location": partition["StorageDescriptor"]["Location"],
}
)
Upvotes: 0
Reputation: 132862
The cheapest way to get the locations of the partitions of a table is to use the GetPartitions
call from the Glue API. It will list all partitions, their values and locations. You can try it out using the AWS CLI tool like this:
aws glue get-partitions --region us-somewhere-1 --database-name your_database --table-name the_table
Using SQL like SELECT DISTINCT partition_col, "$path" FROM the_table
could be expensive since Athena unfortunately scans the whole table to produce the output (it could have just looked at the table metadata but that optimization does not seem to exist yet).
Upvotes: 5
Reputation: 28698
There's no built in or consistent way to get this information.
Assuming you know your partition column(s), you can get this information with a query like
select distinct partition_col, "$path" from my_table
Upvotes: 6