IamMowgoud
IamMowgoud

Reputation: 308

Neo4j graph performance: should I cache slow queries in a separate database?

Setup/Intro

I have 10k+ nodes in my Neo4j graph in which I need to display a sub-graph (100-500 nodes) between 2 start/end nodes on the frontend app along with info about the critical path and the all dependencies (upstream/downstream paths from/to start/end) of each node.

I have a list of all possible start/end nodes and it's tiny (~10 pairs).

The start and end nodes are the params of the request.

The response I have sent from middleware to UI now is something like this:

Nodes: [
{
  Id: 4,
  downstreamIds: [5,6,7], //all nodes on the paths leading to end node
  upstreamIds: [1,2,3], //all nodes on the paths coming from start node
  ...
},
...
]

Problem

The issue is that for each node I have 2 separate queries to get both the downstream and upstream lists...so for n nodes I have 1 query for the nodes + 2n queries for downstream+upstream + 1 query for critical path (nodes with slack=0).

It takes 502 queries to fetch a start/end sub-graph that has 500 nodes in it.

The critical path query is fast not an issue.

However overall this request can take up 2 minutes in worst case scenario i.e: each node has all other nodes as downstream and upstream dependencies.

Possible solutions

  1. Return a list of all relationships which is 2n² edges (500 * 500 * 2 for worst case) and calculate the downstream/upstream list in UI using Javascript. I'm not really sure how to do that with Cypher.
    Also storing 500,000 objects and filtering them in UI doesn't sound right.

  2. Pre-process the queries for downstream/upstream for each node and cache them in a separate fast key-value store. I'm thinking nosql mongoDB.
    So I request for the nodes from graph then get the dependencies from the key-value store with 1 query (much faster/no graph traversal)

Which is better? Any other solutions?

Upvotes: 1

Views: 215

Answers (1)

Meligy
Meligy

Reputation: 36594

No# 1 is a no-no. JavaScript cannot deal with this amount of data. It might be possible if you have an edge server (light API between the frontend and DB) that does that.

No# 2 becomes the only option of what you suggested.

But I also don't know Neo4j well. What you are talking about sounds like a normal scenario for a graph database. I think you should be able to re-design the schema and/or queries to get this to work in a performant manner.

Upvotes: 1

Related Questions