Reputation: 177
The Setup
I have a .NET application that uses the Neo4j client library to perform CRUD operations on a backend Neo4j database. The nodes and relationships in this database represent formulas & parameters.
There are two node labels: BusinessRule
and ChartField
.
For each BusinessRule
node, there will be a single OUTPUTS
relationship to a ChartField
node, and there could be 1 to many USES
relationships to other ChartField
nodes that represent the parameters for the BusinessRule
formula.
When a user finishes configuring all their rules, they publish their changes, which will rebuild the graph.
The Problem
I'm struggling with the performance of adding the relationships in the graph.
In a single "publish", I may have 3,000 distinct BusinessRule
s. Adding all of the BusinessRule
and ChartField
nodes happens quickly, and performance there is not an issue.
But adding the relationships for each of the 3,000 BusinessRule
s is taking a very long time.
Below is an example of the Cypher query that would add the relationships for a single BusinessRule
. It has to run 3,000 times to complete the task. I do have an index on the Id
property for both node types.
MATCH (p1:BusinessRule {Id: '2025-BUDGET-10000184-11061345'})
MATCH (t1:ChartField {Id: '2025-BUDGET-11061345'})
MATCH (v1:ChartField {Id: '2025-BUDGET-11061472'})
MATCH (v2:ChartField {Id: '2025-BUDGET-11062722'})
CREATE (p1)-[:OUTPUTS {Type: 'OUTPUTS', TargetCFID: 11061345, BusinessRuleId: '2025-BUDGET-10000184-11061345'}]->(t1)
CREATE (p1)-[:USES {Type: 'USES', ParamCFID: 11061472, BusinessRuleId: '2025-BUDGET-10000184-11061345'}]->(v1)
CREATE (p1)-[:USES {Type: 'USES', ParamCFID: 11062722, BusinessRuleId: '2025-BUDGET-10000184-11061345'}]->(v2)
This query works by matching the relevant BusinessRule
and ChartField
nodes where p1 is the BusinessRule
, t1 is the target ChartField
node and v* are the ChartField
parameters. Then with those, we can add the relationships.
Does anyone have any suggestions on how to speed up this process? It took almost 22 minutes to execute this cypher (3,000) times.
I've considered batching up to 100 of these cypher queries together to save on the back/forth, but that gets a bit complex, as the alias names have to all be unique.
In some batching tests, I saw some improvement, but nothing significant.
Upvotes: 0
Views: 78
Reputation: 66957
Making thousands of separate queries is definitely not good practice. Not only does it involve a lot of unnecessary networking/transaction overhead, but the neo4j server will have to parse and generate a new plan for every query.
You should be able to easily and efficiently create the relationships using using a single query.
First, create a 3000-element list of maps, where each map has the format:
{p1: '2025-BUDGET-10000184-11061345',
t1: '2025-BUDGET-11061345', t1CFID: '...', t1Br: '...',
v1: '...', v1CFID: '...', v1Br: '...',
v2: '...', v2CFID: '...', v2Br: '...'}
Then, pass that list as a $data
parameter to this query:
UNWIND $data AS d
MATCH (p1:BusinessRule {Id: d.p1})
MATCH (t1:ChartField {Id: d.t1})
MATCH (v1:ChartField {Id: d.v1})
MATCH (v2:ChartField {Id: d.v2})
CREATE (p1)-[:OUTPUTS {Type: 'OUTPUTS', TargetCFID: d.t1CFID, BusinessRuleId: d.t1Br}]->(t1)
CREATE (p1)-[:USES {Type: 'USES', ParamCFID: d.v1CFID, BusinessRuleId: d.v1Br}]->(v1)
CREATE (p1)-[:USES {Type: 'USES', ParamCFID: d.v2CFID, BusinessRuleId: d.v2Br}]->(v2)
Since your $data
would only have 3000 elements, there should be no issue with the transaction running out of memory. But if you had a very large amount of data, then you should consider breaking it down into manageable chunks.
Upvotes: 0