Reputation: 4278
While preparing some requests, I was writing this :
SELECT *
FROM ta A
JOIN tb B
ON A.col1 = B.col1
JOIN tc C
ON B.col2 = C.col2
WHERE B.col3 = 'whatever'
AND C.col4 = 'whatever2'
And I began to think about the following :
SELECT *
FROM ta A
JOIN (SELECT * FROM tb WHERE col3 = 'whatever') B
ON A.col1 = B.col1
JOIN (SELECT * FROM tc WHERE col4 = 'whatever2') C
ON B.col2 = C.col2
(If I'm not mistaken, the result would be the same). I'm wondering if it would be significantly faster ? My guess is that it would but I'd be interested in knowing why/why not ?
(Because our server is down at the moment, I can't test it myself right now, so I'm asking here, I hope you won't mind.)
(In case it matters, the engine is Vertica, but my question isn't really specific to Vertica)
Upvotes: 0
Views: 163
Reputation: 5141
Your first query will work fine, but the second query will not be executed and causes error. The reason behind it is, you are taking JOIN (SELECT * FROM tb WHERE B.col3 = 'whatever') B ON A.col1 = B.col1
.
In this condition you are matching the column with A.col1 = B.col1
. Here you will get A.col1 from ta table, but you will not get B.col1. While specifying a sub query in the join, you should not use ' * ' operator. Joins will not recognize this operator in a sub query. You need to specify required column names. Like the example in below query,
SELECT *
FROM ta A
JOIN (SELECT col1,col2 FROM tb WHERE B.col3 = 'whatever') B
ON A.col1 = B.col1
JOIN (SELECT col2 FROM tc WHERE C.col4 = 'whatever2') C
ON B.col2 = C.col2
This will execute and provides you a result. Two columns is taken in the first join sub query col1,col2, as you are using the condition B.col2 from B table in the second join condition. In a select clause you can provide ' * ' operator which provides you all the columns from all three tables. But you are not supposed to use the operator in a sub query of a join, as joins are coded in such a way.
Both the queries does not have much difference, but your first logic will execute faster compared to the second. In the second logic, two sub queries are used which makes multiple searches in the database and provides you result little slower than the first logic.
Upvotes: 0
Reputation: 7616
Your second query is a little off, it should be:
SELECT *
FROM ta A
JOIN (SELECT * FROM tb WHERE tb.col3 = 'whatever') B
ON A.col1 = B.col1
JOIN (SELECT * FROM tc WHERE tc.col4 = 'whatever2') C
ON B.col2 = C.col2
Notice the inline view where clauses need to reference the table in scope, not the alias for the view. B and C are out of scope within the inline views.
In any case, because you are doing an inner join, it won't matter from a results perspective because the condition is the same whether it occurs pre-join or post-join.
You can reasonably rely on the optimizer to do the following:
That said, there should be no difference between the two statements. Most likely it is pushing down predicates for the first one to make it more like the second one. If you have statistics gathered, the optimizer should be smart enough to query these the same way (or really close).
That isn't to say I haven't seen what you have in your second query "fix" query issues for me in Vertica... but usually it's only when I am using multiple COUNT(DISTINCT ...)
expressions or theta joins, etc.
Now if this were an outer join, then the statements would be different. The first one would apply the filter after the join, the second would be before the join.
Of course, I'll mention that you really just need to do an explain of both methods. Just make sure statistics are gathered.
Hope it helps.
Upvotes: 1