Reputation: 41
I'm obviously missing something, when I run the following query I get a 40ms response:
FOR t IN tt
FOR a in aa
FILTER a.from._key == t._key
AND a.value == @v
LIMIT 20 RETURN t
But when I add a sort the query runs until I have to kill it!
FOR t IN tt
FOR a in aa
FILTER a.from._key == t._key
AND a.value == @v
SORT t.identifier.value ASC
LIMIT 20 RETURN t
I have skiplist index on identifier.value but when I look at the execution plan I see a scary full collection scan.
2 EnumerateCollectionNode 240126 - FOR t IN tt /* full collection scan */
I have 250000 documents in tt like:
{
“_key”: "CA0350ED-4929-3D1A-37A5-8AB951CE3AB5"
"type": “f.g.h”,
"identifier": {
“dist”: “N”,
"value": "Matt100145 ZZZ"
}
}
and 500000 documents in aa like:
{
“_key”: "1BBDE335-B118-3ED5-DE3B-77D822822210"
"type": “a.b.c”,
"from": {
"_key": "CA0350ED-4929-3D1A-37A5-8AB951CE3AB5",
"value": "Matt100145 ZZZ”
}
},
"value": “ZZZ”
}
All identifier.value values in tt are MattNUMBER ZZZ (test data) so maybe that is causing the trouble?
My query plan without the sort looks like:
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
9 IndexNode 2 - FOR a IN aa /* hash index scan */
8 IndexNode 2 - FOR t IN tt /* primary index scan */
6 LimitNode 2 - LIMIT 0, 20
7 ReturnNode 2 - RETURN t
Indexes used:
By Type Collection Unique Sparse Selectivity Fields Ranges
9 hash aa false false 50.00 % [ `value` ] (a.`value` == "Matt123")
8 primary tt true false 100.00 % [ `_key` ] (a.`from`.`_key` == t.`_key`)
and with the SORT looks like:
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
2 EnumerateCollectionNode 85706 - FOR t IN tt /* full collection scan */
6 CalculationNode 85706 - LET #4 = t.`identifier`.`value` /* attribute expression */ /* collections used: t : tt */
10 IndexNode 85706 - FOR a IN aa /* hash index scan */
11 CalculationNode 85706 - LET #2 = (a.`value` == "Matt123") /* simple expression */ /* collections used: a : aa */
5 FilterNode 85706 - FILTER #2
7 SortNode 85706 - SORT #4 ASC
8 LimitNode 20 - LIMIT 0, 20
9 ReturnNode 20 - RETURN t
Indexes used:
By Type Collection Unique Sparse Selectivity Fields Ranges
10 hash aa false false 50.00 % [ `from._key` ] (a.`from`.`_key` == t.`_key`)
Can anyone help?
Upvotes: 1
Views: 91
Reputation: 9097
It's unclear in your case which indexes are actually there on the two collections. One index is mentioned, but to run well, two indexes would be even better. Here is the setup I used:
db.tt.ensureIndex({ type: "skiplist", fields: ["identifier.value"] });
db.aa.ensureIndex({ type: "skiplist", fields: ["value"] });
With this setup and some sample data, the execution plan I am getting with ArangoDB 3.3.12 is:
Query string: FOR t IN tt FOR a in aa FILTER a.from._key == t._key AND a.value == @v LIMIT 20 RETURN t
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
9 IndexNode 25000 - FOR a IN aa /* skiplist index scan */
8 IndexNode 25000 - FOR t IN tt /* primary index scan */
6 LimitNode 20 - LIMIT 0, 20
7 ReturnNode 20 - RETURN t
Indexes used:
By Type Collection Unique Sparse Selectivity Fields Ranges
9 skiplist aa false false n/a [ `value` ] (a.`value` == 123)
8 primary tt true false 100.00 % [ `_key` ] (a.`from`.`_key` == t.`_key`)
This seems to be a good execution plan. Can you check if these indexes help in your case too?
(note: next time please post all indexes you have created plus the full execution plan and not just a fragment - that way you may get more precise help)
Upvotes: 1