Reputation: 3783
I recently picked up Neo4j as it seemed the best type of database to store data I'm currently scraping off a number of online discussion forums. The primary structure of the graph is Community -> Forum -> Thread -> Post -> Author
I'm trying to write the Cypher queries to resolve GraphQL queries, and would like to paginate (for example) the Forum -> Thread connection. The relationship is CONTAINS which holds an order
property i.e. (f:Forum)-[:CONTAINS]->(t:Thread)
From the neo4j-graphql-js library I picked up on their usage of pattern comprehension to run an "inner query" on the child nodes. For example:
MATCH (f:Forum { id: $id })
RETURN f { .id, .name, .url, threads: [(f)-[:CONTAINS]->(t:Thread) | t { .id, .title, .url }] }
I'd really like to use ORDER BY, LIMIT and SKIP on the inner pattern comprehension, but unfortunately this is not supported: https://github.com/opencypher/openCypher/issues/202 - the neo4j-graphql-js library gets around this by using apoc.coll.sortMulti
but I've noticed performance is not great & a lot slower than if I were to use the ORDER BY clause on a top-level pattern equivalent.
Since I'm new to graph DBMS, this has got me wondering if I may have misunderstood how a graph DB is supposed to be used. From a "frontender" perspective, the ability to implement paginating in the lowest-level query language seems like a critical piece, but again, maybe I've not understood things correctly. Square peg, round hole, and all that!
Is this a fair evaluation? Is there another option in Cypher that will solve this? Or, should I just go back to use an SQL database for this use case?
Upvotes: 1
Views: 258
Reputation: 66967
[EDITED]
If you moved the order
property into the Thread
nodes (which should be valid if each Thread
node is connected to only a single Forum
), then you can create an index (or uniqueness constraint) on :Thread(order)
to speed up your query.
For example, this query should leverage the index to paginate forward faster (assuming that the f.id
, the order
value to use for pagination purposes, and limit
value are passed as the parameters id
, order
, and limit
):
MATCH (f:Forum)-[:CONTAINS]->(t:Thread)
WHERE f.id = $id AND t.order > $order
WITH f, t
ORDER BY t.order
LIMIT $limit
RETURN f{.id, .name, .URL,
firstOrder: MIN(t.order),
lastOrder: MAX(t.order),
threads: [x IN COLLECT(t) | x{.id, .title, .URL}]}
And here is a (slightly more complex, but also fast) query for backwards pagination:
MATCH (f:Forum)-[:CONTAINS]->(t:Thread)
WHERE f.id = $id AND t.order < $order
WITH f, t
ORDER BY t.order DESC
LIMIT $limit
WITH f, t
ORDER BY t.order
RETURN f{.id, .name, .URL,
firstOrder: MIN(t.order),
lastOrder: MAX(t.order),
threads: [x IN COLLECT(t) | x{.id, .title, .URL}]}
If you analyze the PROFILE of the above queries with various $limit
values, you should see that the db-hit complexity is O(F*L)
, where F
is the number of Forum
nodes (which is probably relatively constant) and L
is the $limit
value. So, these queries should be significantly faster -- with the index -- as long as:
F*L << (average number of `Threads` per `Forum`).
Upvotes: 1