Reputation: 2493
I can see there is a difference in the amount of rows but I am still not able to understand & account for it.
sel a.15 ,b.16 , c.17, d.18 from
a LOJ b on (a.1=b.1 and a.2=b.2 and a.3=b.3 and a.4=b.4 and a.5=b.5 )
LOJ c on ( a.1=c.1 and a.2=c.2 and a.3=c.3 and a.4=c.4 and a.5=c.5)
LOJ d on ( a.6=d.6 and a.7=d.7 and a.3=d.3 and a.4=d.4)
group by 1,2,3,4
Table a
is a billion + row fact table and PI keys of a
are nowhere participating on the join but b
and c's
PI keys are present in the joins and some PI keys of d
too.
So I rewrote as
select < same keys >
from a LOJ ( select <keylist here> from
b JOIN c
on
(b.1=c.1 and b2.=c.2 and b3=c.3 and b4=c.4 and b.5=c.5)
JOIN d
on ( b.3=d.3 and and b.4=d.4)
) e
on
( e.1=a.1 and e.2=a.2 and e.3=a.3 and e.4=a.4
and e.5=a.5 and e.6=a.6 and a.7=e.7 )
group by <col list>
/*DT e created that joins b c and d.Though I joined on 3 and 4 before
with b and d I took them again in the derived table e
and used them again to improve query efficiency */
The above query runs a ton faster - 1/4th Impact CPU and > 50% off the spool but
I see the small row count difference. So my logic was a
was LOJ with these tables b,c,d
separately. The rewrite will give a
just 1 single table to LOJ with ... kind of confluence of b,c,d
( a 'fused table' , if you will ). How does it matter to a
if it LOJ separately or joins to 1 single table. But it does - I am loosing rows and I want understand how . I understand IJ's can be rewritten like this but LOJ's aren't the same. There is a SMALL difference in count (*) - some 15 rows / million. Well my question is what modifications do I add to the 2nd query so those 15 rows ( the delta ) can also be included and the queries are equivalent.
Well on after thoughts after seeing this again.. I do understand how they can give varying results ( because I am LOJ'ing a
separately thrice )but my Question was what should I add to 2nd to make it equivalent to the 1st.
Upvotes: 1
Views: 1004
Reputation: 915
Your two queries are not the same.
Take a look at first query. In it, a joins all records in b initially. This also gives all records in a, even though it is not present in b, due to left join. This result set is then left joined with c. Again all records in 'a' are retrieved. Same goes in join between d and a. Here, b, c and d are independent of each other.
However, in your second query, you perform inner joins between b and c. This effectively means only rows whose keys are present in both b and c are fetched. Again this result set is INNER joined with d, which results in loss of few records. It is this end product which you left join with a. The records are lost in the subselect itself.
You are observing a small difference right now, because most of the records may have got their matches during inner join. However, your optimization diverges from the original approach.
Upvotes: 1