Reputation: 786
I have a rather complicated Athena query, which I would like to test on a local machine without connecting to Athena. I specified some mock data for testing purposes, and I was hoping that I could use something simple like SQLite to spin up a local database, populate it with mock data, run the tests and tear the database down.
The issue is that the SQLite dialect is just different enough for my Athena query to fail. What are good practices for this task? Should I connect to actual Athena and create mock data there? Are there any tools that can convert SQL queries between dialects?
Testing is being done with Python.
Upvotes: 0
Views: 929
Reputation: 786
Just for completeness of the post I am sharing an actual practical example of the approach I went with. In short I have written a python script that prepends sql statements, for mocking table data, to the actual production query.
Final query looks something like this:
WITH table_1 AS
(select * from (VALUES (1, 'foo', 'foo_doe'), (2, 'bar', 'bar_doe')) as ata ("id","name","surname"))
, table_2 AS
(select * from (VALUES (1, 'forest', 100), (2, 'desert', 122)) as ata ("id","job_name","salary"))
-- My main production query that uses mock data
select * from table_1
join table_2 on table_1.id = table_2.id;
Mock data is defined in easily readable .json
files and the Python script converts the .json
data to the above sql
format.
For mock table names I used same table names as production tables. But you could also parametrise your production query table names and use different table names in the testing environment.
Upvotes: 1
Reputation: 1553
A possibility to achieve what you suggested it's via localstack.
Here an extensive guide that shown also how to create different table types with mocked data, and then connect to localstack via pyathena.
Anyhow Athena support in localstack seems to be in the Pro version (check the pricing page to have more insights).
If the data that you are dealing with is small enough, using the real athena service from AWS might be the best choice - if that's the case consider to use a separate environment.
Upvotes: 1