Reputation: 1302
My database has the following "schema":
My modest test database contains:
Now I would like to query the most popular users, which is the users who have gathered the most likes across all their posts. I came up with the following query:
FOR u IN users
LET nblikes = SUM(FOR post IN 1 OUTBOUND u isAuthor
RETURN LENGTH(GRAPH_EDGES('my-graph', post, { edgeCollectionRestriction: 'likes' })))
SORT nblikes DESC
RETURN {
"username": u.username,
"nblikes": nblikes
}
which executes in approximately 0.8s on my Mid-2014 MacBookPro (2.8GHz Core i7, 16GB RAM). 0.8s isn't shameful but on such a small dataset, I would have expected better considering that, AFAIK, that's all happening in memory.
So I would appreciate if some ArangoDB gurus out there could review my query and hint some potential performance issues. Thanks a lot!
Upvotes: 2
Views: 164
Reputation: 9097
There are a few ways to make this query run faster.
What will improve it most is replacing the inner call to GRAPH_EDGES
with another traversal of depth 1 to find the "likers" as shown below:
FOR u IN users
LET nblikes = SUM(
FOR post IN 1 OUTBOUND u isAuthor
RETURN LENGTH(
/* the following traversal replaces the call to GRAPH_EDGES */
FOR liker IN 1 INBOUND post._id likes
RETURN 1
)
)
SORT nblikes DESC
RETURN {
username: u.username,
nblikes: nblikes
}
The inner function call to GRAPH_EDGES
is quite expensive, and getting rid of it will improve the query execution time much.
Another variant is to replace the (now) two traversals with a plain join like this:
FOR u IN users
LET nblikes = SUM(
/* the following join between users, isAuthor and likes
replaces the traversal & GRAPH_EDGES calls */
FOR a IN isAuthor
FILTER a._from == u._id
FOR l IN likes
FILTER l._to == a._to
RETURN 1
)
SORT nblikes DESC
RETURN {
username: u.username,
nblikes: nblikes
}
Both variants should be faster than the initial query, mainly because GRAPH_EDGES
is expensive to call in a loop. As it is a stateless AQL function, it will need to set up its context repeatedly (as often as it's called from the inner loop). The traversals and the joins solution can keep a bit more context between calls so they are "cheaper".
Upvotes: 3