Reputation: 416
I have data in S3 that is partitioned by category and date as follows:
s3://mybucket/category=1/date=2018-08-30/data1.json
s3://mybucket/category=1/date=2018-08-31/data2.json
s3://mybucket/category=2/date=2018-08-30/data3.json
s3://mybucket/category=2/date=2018-08-31/data4.json
After running the crawler, I have two partition keys in my metadata table: one for category
, the other for date
. I want to retrieve partitions that match certain keys using the GetPartitions API so I began experimenting with the AWS CLI. If I run this command:
aws glue get-partitions --database-name mydb --table-name mytable --expression "category=1" --region us-west-2
I successfully retrieve the partition as expected. However, I tried the following command:
aws glue get-partitions --database-name mydb --table-name mytable --expression "category=1 AND date=2018-08-30" --region us-west-2
and the response was
An error occurred (InvalidInputException) when calling the GetPartitions operation: Unsupported expression '2018 - 08 - 30'
Another command that produced this error was
aws glue get-partitions --database-name mydb --table-name mytable --expression category=1\ AND\ date=2018-08-30 --region us-west-2
I also tried modifying the call by using the following command:
aws glue get-partitions --database-name mydb --table-name mytable --expression "category=1 AND date=2018\-08\-30" --region us-west-2
which gave me the error
An error occurred (InvalidInputException) when calling the GetPartitions operation: Lexical error at line 1, column 35. Encountered: "\" (92), after : ""
Is the GetPartitions API able to handle expressions for partitions that contain hyphens? If so, what is the correct syntax?
Upvotes: 2
Views: 3211
Reputation: 416
Partitions that are initially generated by a crawler in AWS Glue will have type String
in the metadata catalog. While some of my categories contained hyphens, they were in uuids (i.e. category=so36-fkw1-...
) so they were not interpreted as expressions. On the other hand, the dates contain only numeric characters and -
which was the root of the problem. I was able to fix it by enclosing the dates in singular quotes as follows:
aws glue get-partitions --database-name mydb --table-name mytable --expression category=1\ AND\ date=\'2018-08-30\' --region us-west-2
Upvotes: 7