Reputation: 307
Some background information: I have a "question and answer" site with lots of posts and users. The data are stored in an MSSQL database. I am interesting in how a graph database will help producing interesting results to users, like suggest similar users or suggest relevant posts.
I managed to import all posts and users (only their ids) and the "post to post" and "post to user" relation from a test database into ArangoDB database and ended up with 4 collections:
Then I built a graph "postGraph" with the 2 edges and created an AQL query trying to implement a simple "similar users" algorithm based on the "user posts" and "post post" relations. Here is the query searching top 10 similar users for user with _id "users/1":
FOR e in GRAPH_NEIGHBORS('postGraph', 'users/1',
{ edgeCollectionRestriction: ['parentToChildPosts', 'postToUsers'],
vertexCollectionRestriction: 'users',
minDepth: 3,
maxDepth: 3 })
FILTER e.vertex._key != '1'
COLLECT userid = e.vertex._key INTO g
SORT LENGTH(g) DESC
LIMIT 10
RETURN userid
Visually, I started search from a user 'users/1', and find paths to other users following:
user => post => parent/child post => user.
So the depth of traversals is 3. Then I exclude the user himself (FILTER), group by user keys and count the number of results and grab the top 10.
I tested the query and it worked for most users but failed for some active users with the error:
[1909] too many iterations
The query runs quite fast (within one second) and gives the error. One of the failing active users has 727 posts (not so many in my opinion).
I also tried 2 other ways to implement the algorithm:
From the documentation, ArangoDB suggests using graph functions for this kind of situation. So I am not in favor of using Join (gives me a feeling I can also do it in SQL, so why bother using graph database).
Any suggestion how to make the GRAPH_NEIGHBORS work quickly without any errors? Or any other suggestion how to build graph, like how large the graph can grow?
Upvotes: 2
Views: 151
Reputation: 3267
Hi you are totally right, the GRAPH_NEIGHBORS function is built exactly for that.
There is a parameter maxIterations
limiting the amount of vertices that will be touched by one traversal (where GRAPH_NEIGHBORS is mapped onto) and your graph structure might hit the default value (10000) there. We implemented this value to prevent endless looping.
So to fix this you simply increase this value.
Also i have another performance hint for you:
Here is an updated query:
FOR e in GRAPH_NEIGHBORS('postGraph', 'users/1',
{ maxIterations: 1000000,
vertexCollectionRestriction: 'users',
minDepth: 3,
maxDepth: 3 })
FILTER e.vertex._key != '1'
COLLECT userid = e.vertex._key INTO g
SORT LENGTH(g) DESC
LIMIT 10
RETURN userid
For further info please stick to the information in the AQL Traversal documentation. Everything that is available there is also available for GRAPH_NEIGHBORS
Upvotes: 5