Reputation: 3013
I have a data lake in AWS S3
. The format of data is Parquet
. Daily workload is ~70G. I want to build some ad-hoc analytics on top of that data. To do that I see 2 options:
AWS Athena
to request data via HiveQL to get data via AWS Glue
(Data Catalog).What is the best way to do ah-hoc analysis in my case? Is there more efficient way? And what are pros and cons of mentioned options?
PS
After 6 months I'm going to move data from S3 to Amazon Glacier, so that max data volume to query in S3/Redshift can be ~13T
Upvotes: 0
Views: 244