Reputation: 11
when I try to execute an inner join in ArangoDB with only one filter, like this:
FOR doc1 in catalogue
FOR doc2 in RepoNodes
FOR doc3 in RepoEdges
FOR doc4 in RepoNodes
FOR doc5 in RepoEdges
FOR doc6 in RepoNodes
FOR doc10 in catalogue
FOR doc11 in similarities
FOR doc12 in clearance
FILTER doc1.trackid== "TRAAAAK128F9318786" AND doc1.trackid==doc2.mongodbsongs
AND doc3._from==doc2._id AND doc3._to ==doc4._id AND doc5._from==doc4._id and doc6._id ==doc5._to
AND doc10.trackid== doc6.mongodbsongs
AND doc11._from==CONCAT("Tracks/",doc6.neo4jSong)
AND doc6.redisclearance== doc12._key
return {doc10,doc11,doc12}
i notice that indexes are ignored.. I mean.. It executes a full collection scan. if I serialize the same query, it works perfectly. I don't understand why.. If I stop before the last 4 FOR it's ok.. where is the problem? Why is a full collection scan needed?
Upvotes: 1
Views: 315
Reputation: 9097
n the latter query (the one that does not use indexes), the query optimizer executes a function that will move the FOR loops around in the query. It will try to create all possible permutations of FOR loops that don't change the meaning of the query.
In the latter query, the optimizer is free to move around any of the 9 FOR loops, because there are no FILTER statements between them that could be problematic. Here the optimizer will start creating new execution plans with FOR loops moved around. This could create 9! (faculty, i.e. 362880) different execution plans, but by default the optimizer stops at 192 plans in order to avoid that combinatoric explosion. And when that number of plan is reached, the optimizer will stop applying some further optimizations in order to put a cap on the overall optimization runtime.
That's the reason why in the second query the EnumerateCollectionNodes are not converted into IndexNodes. The optimizer has stopped before this step because there are already so many execution plans.
In the first query, the optimizer is not as free to move around the FOR loops, because the FILTER statements restrict some of the movements. And because of that, it will not generate as much execution plans and not stop optimizing prematurely.
Release 3.0.2 contains a fix for this. A temporary workaround before that is to split the one FILTER statement in the second query into multiple smaller FILTER statements and move them near their respective FOR loops.
Upvotes: 2