Reputation: 131
Have document collections "items", and "dictionary1", "dictionary2"
collection "items" have 50000 records, like
{
label:"..."
dict1: "dictionary1/33333"
dict2: "dictionary2/44444"
....
}
When I join dictionaries with
FOR item IN items
LET dictname = FIRST(FOR d IN dictionary1 FILTER d._id == item.dict1 RETURN d.name)
RETURN { _id: item._id, name: item.name, dict: dictname }
Query execution time ~ 150ms its to much for this simple task. For experiment I am force dict for one _id
FOR item IN items
LET dictname = FIRST(FOR d IN dictionary1 FILTER d._id == "dictionary1/10000" RETURN d.name)
RETURN { _id: item._id, name: item.name, dict: dictname }
Query execution time ~ 130ms
Also I an tried DOCUMENT function
FOR item IN items
LET dictname = DOCUMENT("dictionary1", "dictionary1/10000")
RETURN { _id: item._id, name: item.name, dict: dictname.name }
Query execution time ~ 1500ms :((
And simple read from collection:
FOR item IN items
RETURN { _id: item._id, name: item.name }
Query execution time ~ 30ms
So result not cached when I get same document by _id 50000 times as join. Variant with duplicate content in one collection not useful for me. Now I move logic to client side, select all dictionaries in memory and join after query, and have ~60ms, but I don't like this method, it wrong.
Any way to get result with joined dictionaries(80% same _id) with short time?
Upvotes: 1
Views: 865
Reputation: 9097
If you're sure that there is a matching item in dictionary1
for each items
that you loop over in the outer loop, the query can be simplified to a simpler equi-join like this:
FOR item IN items
FOR d IN dictionary1
FILTER d._id == item.dict1
RETURN { _id: item._id, name: item.name, dict: d.name }
This removes the need for the correlated subquery and the function call. However, its result will only be equivalent to the original subquery solution if there are matching dictionary
entries for each item
. If there is no dictionary
entry for an item
, the equi-join query will simply suppress it.
If it's unclear whether there'll be an entry in dictionary1
, and you want null
s to be returned for non-matches, the subquery solution can be simplified to (this removes the function call to FIRST
):
FOR item IN items
LET sub = (FOR d IN dictionary1
FILTER d._id == item.dict1
RETURN d.name
)
RETURN { _id: item._id, name: item.name, dict: sub[0] }
Apart from that, joining on the _key
attribute instead of joining on _id
may provide minor speedups, as _key
is shorter and less complex to produce than _id
. But that will require storing the join values in items
without the dictionary collection names prefixes.
The above suggestions shouldn't provide order-of-magnitude runtime improvements, but maybe they help.
Upvotes: 1