Reputation: 2100
I'm facing the problem that my AWS cost report shows more data used by Athena queries than I'd expect. In the AWS Console (the web UI) there is an Athena History available showing the data scanned. See screenshot:
I'd like to be able to access this query information for a larger timeframe to analyse and optimise my Athena usage. I'd be good to also enable some logging or monitoring service if needed.
Where can I get this query log?
Upvotes: 1
Views: 609
Reputation: 2668
I'd suggest to use AWS SDK, e.g. boto3 library for python. In particular, you need to combine
Here is a quick example how your script could look like:
import boto3
client = boto3.client('athena')
response = client.list_query_executions()
for query_id in response['QueryExecutionIds']:
query_execution_response = client.get_query_execution(
QueryExecutionId=query_id
)
query_string = query_execution_response['QueryExecution']['Query']
query_exec_stats = query_execution_response['QueryExecution']['Statistics']
query_data_scanned = query_exec_stats['DataScannedInBytes']
query_exec_time = query_exec_stats['TotalExecutionTimeInMillis']
print(f"=> Query id: {query_id}")
print(f"Executed SQL statement: {query_string}")
print(f"Total execution time: {query_exec_time}")
print(f"Total data scanned: {query_data_scanned}")
Note, that if you have run a lot of different queries, then you'd better to use paginator.
Upvotes: 2