Reputation: 1717
When executing AQL against an indexed database, I find that my queries are significantly faster. However, the query explainer states that no indices were used. How can this be? Is it a bug? I run the same query in two identical databases, except that one has a variety of indices and the other has no indices. In the indexed database, the query takes 5 seconds. In the non-indexed database, the query takes 13 seconds. Both databases say that no indices were used when explaining the query, and provide identical execution plans and optimization rules.
let lastYear = left(date_subtract(date_now(),1,"year"),10)
for v0, e0, in outbound 'myFleet' graph 'myGraph'
filter e0.type == 'myType' && v0.attr == 'myAttr'
let myCount = (
for v1, e1 in outbound v0._id graph 'myOtherGraph'
filter e1.category == "myCategory" && e1.date > lastYear
collect with count into stuff
return stuff)
sort myCount desc
return {profile: v0.name, count: mycount[0]}
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
2 CalculationNode 1 - LET lastYear = LEFT(DATE_SUBTRACT(DATE_NOW(), 1, "year"), 10) /* v8 expression */
3 TraversalNode 3 - FOR v0 /* vertex */, e0 /* edge */ IN 1..1 /* min..maxPathDepth */ OUTBOUND 'myFleet' /* startnode */ GRAPH 'myGraph'
4 CalculationNode 3 - LET #11 = (((e0.`type` == "myType") && (v0.`attr` == "myAttr"))) /* simple expression */
5 FilterNode 3 - FILTER #11
15 SubqueryNode 3 - LET myCount = ... /* subquery */
6 SingletonNode 1 * ROOT
7 CalculationNode 1 - LET #13 = v0.`_id` /* attribute expression */
8 TraversalNode 10 - FOR v1 /* vertex */, e1 /* edge */ IN 1..1 /* min..maxPathDepth */ OUTBOUND #13 /* startnode */ GRAPH 'myOtherGraph'
9 CalculationNode 10 - LET #17 = (e1.`category` == "myCategory") /* simple expression */
10 FilterNode 10 - FILTER #17
11 CalculationNode 10 - LET #19 = (e1.`date` > lastYear) /* simple expression */
12 FilterNode 10 - FILTER #19
13 CollectNode 1 - COLLECT WITH COUNT INTO stuff /* sorted*/
14 ReturnNode 1 - RETURN stuff
16 SortNode 3 - SORT myCount DESC
17 CalculationNode 3 - LET #21 = { "profile" : v0.`name`, "count" : myCount[0] } /* simple expression */
18 ReturnNode 3 - RETURN #21
In the indexed database:
v0.attr
has: {hash: {unique: false, sparse: false}}
with selectivity 0.08%e1.category
has {hash: {unique: false, sparse: false}}
with selectivity 0.00%e1.date
has {skiplist: {unique: false, sparse: false}}
In the non-indexed database, the only indices are the defaults for document keys and edge identifiers.
In the indexed database, the query is significantly faster than the non-indexed database, but the query explainer on both says:
Indexes used:
none
It is clear that the database is using the indices, but the query explainer won't say how.
Upvotes: 2
Views: 90
Reputation: 9097
(note: just gave the same answer here as the questions are similar)
In ArangoDB 3.0 a traversal will always use the edge index to find connected vertices, regardless of which filter conditions are present in the query and regardless of which indexes exist.
In ArangoDB 3.1 the optimizer will try to find the best possible index for each level of the traversal. It will inspect the traversal's filter condition and for each level pick the index for which it estimates the lowest cost. If there are no user-defined indexes, it will still use the edge index to find connected vertices. Other indexes will be used if there are filter conditions on edge attributes which are also indexed and the index has a better estimated average selectivity than the edge index.
In 3.1.0 the explain output will always show "Indexes used: none" for traversals, even though a traversal will always use an index. The index display is just missing in the explain output. This has been fixed in ArangoDB 3.1.1, which will show the individual indexes selected by the optimizer for each level of the traversal.
For example, the following query shows the following explain output in 3.1:
Query string:
FOR v, e, p in 0..3 ANY 'v/test0' e
FILTER p.edges[0].type == 1 && p.edges[2].type == 2
RETURN p.vertices
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
2 TraversalNode 8000 - FOR v /* vertex */, p /* paths */ IN 0..3 /* min..maxPathDepth */ ANY 'v/test0' /* startnode */ e
3 CalculationNode 8000 - LET #5 = ((p.`edges`[0].`type` == 1) && (p.`edges`[2].`type` == 2)) /* simple expression */
4 FilterNode 8000 - FILTER #5
5 CalculationNode 8000 - LET #7 = p.`vertices` /* attribute expression */
6 ReturnNode 8000 - RETURN #7
Indexes used:
By Type Collection Unique Sparse Selectivity Fields Ranges
2 edge e false false 10.00 % [ `_from`, `_to` ] base INBOUND
2 edge e false false 10.00 % [ `_from`, `_to` ] base OUTBOUND
2 hash e false false 63.60 % [ `_to`, `type` ] level 0 INBOUND
2 hash e false false 64.40 % [ `_from`, `type` ] level 0 OUTBOUND
2 hash e false false 63.60 % [ `_to`, `type` ] level 2 INBOUND
2 hash e false false 64.40 % [ `_from`, `type` ] level 2 OUTBOUND
Additional indexes are present on [ "_to", "type" ]
and [ "_from", "type" ]
. Those are used on levels 0 and 2 of the traversal because there are filter conditions for the edges on these levels that can use these indexes. For all other levels, the traversal will use the indexes labeled with "base" in the "Ranges" column.
The explain output fix will become available with 3.1.1, which will be released soon.
Upvotes: 2