Reputation: 1514
Have a performance question about joining tables in Kafka, currently the topology defined as the following code:
table1
.leftJoin(table2, Pair::with)
.leftJoin(table3, Pair::add)
.join(table4, (left) -> left.getValue(0).getId() Triplet::add)
.leftJoin(table5, Quartet::add)
.leftJoin(table6, Quintet::add)
I just wanted to know if I move the .join before others, can be improved on the performance and speed of consuming the data? (like below code):
table1
.join(table4, (left) -> left.getValue(0).getId() Pair::with)
.leftJoin(table2, Pair::add)
.leftJoin(table3, Triplet::add)
.leftJoin(table5, Quartet::add)
.leftJoin(table6, Quintet::add)
Upvotes: 0
Views: 45
Reputation: 2061
Yes, performance will be improved. Let assume database provider don't do other things such as automatically optimize query.
Way 1: A left join B left join C inner join D
1.A left join B => Full records A
2.A left join C => Full records A
3.A inner join D => Partial A
Way 2: A inner join D left join B left join C
1.A inner join D => Partial A => A1( significantly improvement here)
2.A1 left join B => Full A1
3.A1 left join C => Full A1
At step 1, way 2 have reduced the number of rows in DB => less records which are used for left join B and C.
Upvotes: 1