chemdog95
chemdog95

Reputation: 416

AWS Glue API not recognizing partitions with hyphen

I have data in S3 that is partitioned by category and date as follows:

s3://mybucket/category=1/date=2018-08-30/data1.json
s3://mybucket/category=1/date=2018-08-31/data2.json
s3://mybucket/category=2/date=2018-08-30/data3.json
s3://mybucket/category=2/date=2018-08-31/data4.json

After running the crawler, I have two partition keys in my metadata table: one for category, the other for date. I want to retrieve partitions that match certain keys using the GetPartitions API so I began experimenting with the AWS CLI. If I run this command:

aws glue get-partitions --database-name mydb --table-name mytable --expression "category=1" --region us-west-2

I successfully retrieve the partition as expected. However, I tried the following command:

aws glue get-partitions --database-name mydb --table-name mytable --expression "category=1 AND date=2018-08-30" --region us-west-2

and the response was

An error occurred (InvalidInputException) when calling the GetPartitions operation: Unsupported expression '2018 - 08 - 30'

Another command that produced this error was

aws glue get-partitions --database-name mydb --table-name mytable --expression category=1\ AND\ date=2018-08-30 --region us-west-2

I also tried modifying the call by using the following command:

aws glue get-partitions --database-name mydb --table-name mytable --expression "category=1 AND date=2018\-08\-30" --region us-west-2

which gave me the error

An error occurred (InvalidInputException) when calling the GetPartitions operation: Lexical error at line 1, column 35. Encountered: "\" (92), after : ""

Is the GetPartitions API able to handle expressions for partitions that contain hyphens? If so, what is the correct syntax?

Upvotes: 2

Views: 3211

Answers (1)

chemdog95
chemdog95

Reputation: 416

Partitions that are initially generated by a crawler in AWS Glue will have type String in the metadata catalog. While some of my categories contained hyphens, they were in uuids (i.e. category=so36-fkw1-...) so they were not interpreted as expressions. On the other hand, the dates contain only numeric characters and - which was the root of the problem. I was able to fix it by enclosing the dates in singular quotes as follows:

aws glue get-partitions --database-name mydb --table-name mytable --expression category=1\ AND\ date=\'2018-08-30\' --region us-west-2

Upvotes: 7

Related Questions