mahesh salunke
mahesh salunke

Reputation: 103

AWS Athena- reduce scan size

How to reduce to data scanned size for 'select' query in AWS athena. By scanning only one of the column.

Example: SELECT * FROM TABLE1 WHERE STATUS='Fail';

Upvotes: 0

Views: 2450

Answers (2)

Abhijeet Gaikwad
Abhijeet Gaikwad

Reputation: 41

See Athena performance tuning tips. This AWS blog has multiple tips on reducing data scanned as well as improving performance. Major ones that I see are:

Upvotes: 4

Zerodf
Zerodf

Reputation: 2298

The simplest way to reduce the scan size would be to partition based on the data by the STATUS value.

See the user guide for information about partitioning. However, you may want to consider a columnar format such as Apache Parquet as well, which is a columnar data storage and interchange format which is supported by Athena.

Using a columnar format is helpful because Athena will only read the columns it must to satisfy the query. For a SELECT * query it usually won't make much of a difference, but the I/O savings can be substantial if you're only interested in a few columns out of dozens or hundreds. In addition, Parquet (and ORC, a competing columnar format also supported by Athena) support compression, so even when all columns are accessed it's still quite a savings over uncompressed CSV or JSON.

Upvotes: 4

Related Questions