Reputation: 11
My team has an application, in which we have different configurations that contains some query data. Each configuration corresponds to some queries and business logic written as glue jobs. When the configuration is executed, the queries are either executed on AWS Athena or/and AWS Glue to get a final result. We do other operations in glue job as well sometimes, depending on our business logic for the configuration. We want to build a testing framework, that can test our business logic. So we can provide some dummy data, either locally or some S3 file/table, and the business logic executes as usual depending on the configuration on this dummy data and executes all the Athena queries or Glue queries/operations and gives us the filtered data, which we can match with some 'expected data' to test our business logic. We want to achieve this with some local computation instead of actual calls to AWS Athena or glue to keep our costs in check. If anyone has any suggestions for this, please let me know.
I found something for AWS glue: developing a docker image but couldn't find anything similar for Athena.
Upvotes: 1
Views: 564
Reputation: 142008
Athena is based on Presto/Trino (depending on engine version you are using) so you can use docker images for those (Trino, Presto). But note that API surface can differ a bit so for "honest" integration testing it is still better to use the Athena itself.
Also Athena charges per data scanned ($5.00 per TB of data scanned) so if your dummy data is small I would argue it would not be that much to just use the Athena itself.
Upvotes: 1