StatsStudent
StatsStudent

Reputation: 1604

Limit data read from a table in Athena?

Is it possible to limit the number of rows read from a table with an Athena query, and if so how?

I looked at the LIMIT statement, but it seems LIMIT doesn't affect how many rows are scanned, but simply limits the number of rows returned from a query. But I'm not looking to reduce the number of rows returned, but restrict the query to be performed on, say, the top N rows in a table. For example, I have a table that has a date column, call it mydate, that is the same for every row in the table, mytable. I want to return a single value of mydate. I could do this with:

select 
     min(mydate)
from
     mytable

but this will perform a full table scan. What I really want is for a query to take the table I provide it and "look" at the first row and return mydate, so it only has to scan a single record.

I use the SAS statistical software and this can be done in it by running:

data work.temp;         /*create a temp dataset in the working directory*/
set  mytable(obs=2);    /*read in only the first two rows from the mytable table*/
run;

I also use R and in the R language, I could do this even more simply and directly using the following code:

mytable[1, ]$mydate

So, I'm looking for a way, if possible, to accomplish the same thing that SAS and R are doing above.

Thanks.

Upvotes: 0

Views: 250

Answers (0)

Related Questions