Reputation: 19728
I have a use case to query request url from S3 logs. Amazon has recently introduced Athena to query S3 file contents. What is the best option with respect to cost and performance?
Upvotes: 5
Views: 6545
Reputation: 10864
Athena vs. DynamoDB: If functionally you can achieve your requirement with both ; then:
Hence, if you need to rarely query on your data Athena would be a better solution else DynamoDB. Also, if performance is important DynamoDB is the answer. Also, if you already have TBs of data in S3; then Athena is a solution as why would u load it into DynamoDB which would cost a bomb ( until and unless u want query results in milli seconds or seconds )..
Upvotes: 4
Reputation: 269340
Amazon DynamoDB would be a poor choice for running queries over web logs.
DynamoDB is super-fast, but only if you are retrieving data based upon its Primary Key ("Query"). If you are running a query against ALL data in a table (eg to find a particular IP address in a Key that is not indexed), DynamoDB will need to scan through ALL rows in the table, which takes a lot of time ("Scan"). For example, if your table is configured for 100 Reads per Second and you are scanning 10000 rows, it will take 100 seconds (100 x 100 = 10000).
Tip: Do not do full-table scans in a NoSQL database.
Amazon Athena is ideal for scanning log files! There is no need to pre-load data - simply run the query against the logs already stored in Amazon S3. Use standard SQL to find the data you're seeking. Plus, you only pay for the data that is read from disk. The file format is a bit weird, so you'll need the correct CREATE TABLE
statement.
See: Using AWS Athena to query S3 Server Access Logs
Another choice is to use Amazon Redshift, which can GBs, TBs and even PBs of data across billions of rows. If you are going to run frequent queries against the log data, Redshift is great. However, being a standard SQL database, you will need to pre-load the data into Redshift. Unfortunately, Amazon S3 log files are not in CSV format, so you would need to ETL the files into a suitable format. This isn't worthwhile for occasional, ad-hoc requests.
Many people also like to use Amazon Elasticsearch Service for scanning log files. Again, the file format needs some special handling and the pipeline to load the data needs some work, but the result is near-realtime interactive analysis of your S3 log files.
See: Using the ELK stack to analyze your S3 logs
Upvotes: 11
Reputation: 2576
As Deepak mentioned, DynamoDB is faster, but cost is higher than Athena. Depending on your use case, implementing the solution with hybrid approach might give you good results in certain scenarios.
You can use DynamoDB to store recent, and read heavy data. Old, read inexpensive data can be stored in S3 and use Athena to query over it.
However, implementation wise this will be bit complex.
Upvotes: 0