cloudycider
cloudycider

Reputation: 37

How to "LIMIT" the amount of data fetched in SQL

I'm querying multiple tables joined together

SELECT a.column, b.column, c.column 
FROM t1 AS a, t2 AS b, t3 AS c
WHERE a.column = b.column AND a.column = b.column AND a.column = b.column

Is there any way to limit the amount of data scanned, so it doesn't query the entire dataset? Note there isn't the possibility of filtering by date/time or some other condition.

I know that if you put LIMIT 100 it can still query the entire results set. Is there a way to literally just query a random set of 100 rows and return them (cutting down on query time and workload )

Upvotes: 0

Views: 1187

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269623

In databases that support LIMIT, the LIMIT applies to the result set not to the data being scanned. The SQL optimizer is free to choose whatever execution plan it wants. It can take the LIMIT into account, so the query is optimized for "time to first row" rather than "time to last row".

Next, you should be using proper, explicit, standard, readable JOIN syntax. If you want to limit the amount of data read, then you can put limit in a subquery:

SELECT a.column, b.column, c.column 
FROM (SELECT t1.*
      FROM t1 
      LIMIT 100
     ) a JOIN
     t2 b
     ON a.column = b.column JOIN
     t3 c
     ON c.column = b.column  -- or whatever;

Note: This is not guaranteed to return 100 rows, but it should limit the scanning of t1 -- which may or may not be relevant. Also, one of your original tags was BigQuery and merely limiting the number of rows scanned has no impact on performance (as opposed to pruning partitions).

I should also note that LIMIT is usually used with ORDER BY, so the result set is stable. That is, ORDER BY would make it consistent from one run to the next -- rather than returning an indeterminate (but not random) 100 rows.

Upvotes: 2

Related Questions