Reputation: 37
I'm querying multiple tables joined together
SELECT a.column, b.column, c.column
FROM t1 AS a, t2 AS b, t3 AS c
WHERE a.column = b.column AND a.column = b.column AND a.column = b.column
Is there any way to limit the amount of data scanned, so it doesn't query the entire dataset? Note there isn't the possibility of filtering by date/time or some other condition.
I know that if you put LIMIT 100 it can still query the entire results set. Is there a way to literally just query a random set of 100 rows and return them (cutting down on query time and workload )
Upvotes: 0
Views: 1187
Reputation: 1269623
In databases that support LIMIT
, the LIMIT
applies to the result set not to the data being scanned. The SQL optimizer is free to choose whatever execution plan it wants. It can take the LIMIT
into account, so the query is optimized for "time to first row" rather than "time to last row".
Next, you should be using proper, explicit, standard, readable JOIN
syntax. If you want to limit the amount of data read, then you can put limit in a subquery:
SELECT a.column, b.column, c.column
FROM (SELECT t1.*
FROM t1
LIMIT 100
) a JOIN
t2 b
ON a.column = b.column JOIN
t3 c
ON c.column = b.column -- or whatever;
Note: This is not guaranteed to return 100 rows, but it should limit the scanning of t1
-- which may or may not be relevant. Also, one of your original tags was BigQuery and merely limiting the number of rows scanned has no impact on performance (as opposed to pruning partitions).
I should also note that LIMIT
is usually used with ORDER BY
, so the result set is stable. That is, ORDER BY
would make it consistent from one run to the next -- rather than returning an indeterminate (but not random) 100 rows.
Upvotes: 2