Neo4j Cypher has ultra-restrictive pattern comprehension - or am I using it wrong?

Question

I recently picked up Neo4j as it seemed the best type of database to store data I'm currently scraping off a number of online discussion forums. The primary structure of the graph is Community -> Forum -> Thread -> Post -> Author

I'm trying to write the Cypher queries to resolve GraphQL queries, and would like to paginate (for example) the Forum -> Thread connection. The relationship is CONTAINS which holds an order property i.e. (f:Forum)-[:CONTAINS]->(t:Thread)

From the neo4j-graphql-js library I picked up on their usage of pattern comprehension to run an "inner query" on the child nodes. For example:

MATCH (f:Forum { id: $id })
RETURN f { .id, .name, .url, threads: [(f)-[:CONTAINS]->(t:Thread) | t { .id, .title, .url }] }

I'd really like to use ORDER BY, LIMIT and SKIP on the inner pattern comprehension, but unfortunately this is not supported: https://github.com/opencypher/openCypher/issues/202 - the neo4j-graphql-js library gets around this by using apoc.coll.sortMulti but I've noticed performance is not great & a lot slower than if I were to use the ORDER BY clause on a top-level pattern equivalent.

Since I'm new to graph DBMS, this has got me wondering if I may have misunderstood how a graph DB is supposed to be used. From a "frontender" perspective, the ability to implement paginating in the lowest-level query language seems like a critical piece, but again, maybe I've not understood things correctly. Square peg, round hole, and all that!

Is this a fair evaluation? Is there another option in Cypher that will solve this? Or, should I just go back to use an SQL database for this use case?

cybersam · Accepted Answer

[EDITED]

If you moved the order property into the Thread nodes (which should be valid if each Thread node is connected to only a single Forum), then you can create an index (or uniqueness constraint) on :Thread(order) to speed up your query.

For example, this query should leverage the index to paginate forward faster (assuming that the f.id, the order value to use for pagination purposes, and limit value are passed as the parameters id, order, and limit):

MATCH (f:Forum)-[:CONTAINS]->(t:Thread)
WHERE f.id = $id AND t.order > $order
WITH f, t
ORDER BY t.order
LIMIT $limit
RETURN f{.id, .name, .URL,
  firstOrder: MIN(t.order),
  lastOrder: MAX(t.order),
  threads: [x IN COLLECT(t) | x{.id, .title, .URL}]}

And here is a (slightly more complex, but also fast) query for backwards pagination:

MATCH (f:Forum)-[:CONTAINS]->(t:Thread)
WHERE f.id = $id AND t.order < $order
WITH f, t
ORDER BY t.order DESC
LIMIT $limit
WITH f, t
ORDER BY t.order
RETURN f{.id, .name, .URL,
  firstOrder: MIN(t.order),
  lastOrder: MAX(t.order),
  threads: [x IN COLLECT(t) | x{.id, .title, .URL}]}

If you analyze the PROFILE of the above queries with various $limit values, you should see that the db-hit complexity is O(F*L), where F is the number of Forum nodes (which is probably relatively constant) and L is the $limit value. So, these queries should be significantly faster -- with the index -- as long as:

F*L << (average number of `Threads` per `Forum`).

Neo4j Cypher has ultra-restrictive pattern comprehension - or am I using it wrong?

Answers (1)

Related Questions